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Preface 


In this book, four basic areas of discrete mathematics are presented: Counting and Listing 
(Unit CL), Functions (Unit Fn), Decision Trees and Recursion (Unit DT), and Basic 
Concepts in Graph Theory (Unit GT). At the end of each unit is a list of Multiple Choice 
Questions for Review. These questions (answers provided) represent key ideas and should be 
returned to frequently as your study progresses. At the end of each section within a unit you 
will find a set of exercises. Try to solve them and check your answers against those provided at 
the end of the book. 


Concepts from discrete mathematics are used in many different fields (e.g., computer program- 
ming, engineering, biology, economics, operations research, sociology). Each field has its own 
special terminology. In this book, we use the powerful and universal language of mathematics to 
unify these concepts and thus make available a much larger knowledge base to specialists. 


The subjects presented here represent one of the most enjoyable and entertaining areas of math- 
ematics. If possible, take your time and enjoy the problems. For most readers, the units of study 
can be taken in any order. The comprehensive index provided will allow you to fill in gaps in 
your knowledge from other parts of the book. 
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Unit CL 


Basic Counting and Listing 


Section 1: Lists with Repetitions 


We begin with some matters of terminology and notation. Two words that we shall often 
use are set and list. (Lists are also called strings.) Both words refer to collections of objects. 
There is no standard notation for lists. Some of those in use are 


apple banana pear peach 
apple, banana, pear, peach 


and _ (apple, banana, pear, peach). 


The notation for sets is standard: the items are separated by commas and surrounded 
by curly brackets as in 
{apple, banana, pear, peach}. 


The curly bracket notation for sets is so well established that you can normally assume it 
means a set — but beware, some mathematical software systems use { } (curly brackets) 
for lists. 


What is the difference between a set and a list? “Set” means a collection of distinct 
objects in which the order doesn’t matter. Thus 


{apple, peach, pear} and {peach, apple, pear} 


are the same sets, and the set {apple, peach, apple} is the same as the set {apple, peach}. 
In other words, repeated elements are treated as if they occurred only once. Thus two sets 
are the same if and only if each element that is in one set is in both. In a list, order is 
important and repeated objects are usually allowed. Thus 


(apple, peach) (peach,apple) and (apple, peach, apple) 


are three different lists. Two lists are the same if and only if they have exactly the same 
items in exactly the same positions. Thus, “sets” and “lists” represent different concepts: 
A list is always ordered and a set has no repeated elements. 


Example 1 (Using the terminology) People, in their everyday lives, deal with the 
issues of “order is important” and “order is not important.” Imagine that Tim, Jane, 
and Linda are going to go shopping for groceries. Tim makes a note to remind himself 
to get apples and bananas. Tim’s note might be written out in an orderly manner, or 
might just be words randomly placed on a sheet of paper. In any case, the purpose of 
the note is to remind him to buy some apples and bananas and, we assume, the order in 
which these items are noted is not important. The number of apples and bananas is not 
specified in the note. That will be determined at the store after inspecting the quality of 
the apples and bananas. The best model for this note is a set. Tim might have written 
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{apples, bananas}. We have added the braces to emphasize that we are talking about 
sets. Suppose Jane wrote {bananas, apples} and Linda wrote {apples, bananas, apples}. 
Linda was a bit forgetful and wrote apples twice. It doesn’t matter. All three sets are 
the same and all call for the purchase of some apples and some bananas. If Linda’s friend 
Mary had made the note {peaches, bananas, oranges} and Linda and Mary had decided to 
combine their notes and go shopping together, they would have gone to the store to get 
{apples, peaches, bananas, oranges}. 


There are times when order is important for notes regarding shopping trips or daily 
activities. For example, suppose Tim makes out the list (dentist, bookstore, groceries). It 
may be that he regards it as important to do these chores in the order specified. The 
dentist appointment may be at eight in the morning. The bookstore may not be open 
until nine in the morning. He may be planning to purchase milk at the grocery store and 
does not want the milk to be sitting in the car while he goes to the bookstore. In a list 
where order matters, the list (dentist, bookstore, groceries, dentist) would be different than 
(dentist, bookstore, groceries). The first list directs Tim to return to the dentist after the 
groceries, perhaps for a quick check that the cement on his dental work is curing properly. 


In addition to the sets and lists described above, there is another concept that oc- 
curs in both everyday life and in mathematics. Suppose Tim, Jane, and Linda hap- 
pen to go the grocery store and are all standing in line at the checkout counter with 
bags in hand containing their purchases. They compare purchases. Tim says “I pur- 
chased 3 bananas and 2 apples.” Jane says, “I purchased 2 bananas and 3 apples.” 
Linda says, “I purchased 3 apples and 2 bananas.” Jane and Linda now say in uni- 
son “Our purchases are the same!” Notice that repetition (how many bananas and ap- 
ples) now matters, but as with sets, order doesn’t matter (Jane and Linda announced 
their purchases in different order but concluded their purchases were the same). We 
might use the following notation: Tim purchased {2 apples, 3 bananas}, Jane purchased 
{3 apples, 2 bananas}, Linda purchased {2 bananas, 3 apples}. Another alternative is to 
write {apple, apple, banana, banana, banana} for Tim’s purchase. All that matters is the 
number of apples and bananas, so we could have written 


{apple, banana, apple, banana, banana} 


for Tim’s purchase. Such collections, where order doesn’t matter, but repetition does 
matter are called multisets in mathematics. Notice that if Tim and Jane dumped their 
purchases into the same bag they would have the combined purchase {5 apples, 5 bananas}. 
Combining multisets requires that we keep track of repetitions of objects. In this chapter, 
we deal with sets and lists. We will have some brief encounters with multisets later in our 
studies. OJ 


To summarize the concepts in the previous example: 


List: an ordered collection. Whenever we refer to a list, we will indicate whether the 
elements must be distinct. 


Set: a collection of distinct objects where order does not matter. 


! A list is sometimes called a string, a sequence or a word. Lists are sometimes called 
vectors and the elements components. 
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Multiset: a collection of objects (repeats allowed) where order does not matter.” 


The terminology “k-list” is frequently used in place of the more cumbersome “k-long list.” 
Similarly, we use k-set and k-multiset. Vertical bars (also used for absolute value) are used 
to denote the number of elements in a set or in a list. We call |A| “the number of elements 
in A” or, alternatively, “the cardinality of A.” For example, if A is an n-set, then |A| = n. 


We want to know how many ways we can do various things with a set. Here are some 
examples, which we illustrate by using the set S = {z, y, z}. 


1. How many ways can we list, without repetition, all the elements of S’? This means, 
how many ways can we arrange the elements of S in a list so that each element of S$ 
appears exactly once in each of the lists. For S = {x, y, z}, there are six ways: xyz, xzy, 
yxz, yeu, zzy and zyx. Notice that we have written the list (x,y,z) simply as xyz 
since there is no possibility of confusion. (These six lists are all called permutations 
of S. People often use Greek letters like 7 and o to indicate a permutation of a set.) 


2. How many ways can we construct a k-list of distinct elements from a set? When 
k, = |S|, this is the previous question. If k = 2 and S = {x,y,z}, there are six ways: 
LY, LZ, YL, y2, zu and zy. 


3. If the list in the previous question is allowed to contain repetitions, what is the answer? 
There are nine ways for S = {x,y,z}: rx, ry, £2, yx, yy, yz, 2x, zy and zz. 


4. If, in Questions 2 and 3, the order in which the elements appear doesn’t matter, 
what are the answers? For S = {x,y,z} and k = 2, the answers are three and six, 
respectively. We are forming 2-sets and 2-multisets from the elements of S. The 2-sets 
are {x,y}, {z,z} and {y,z}. The 2-multisets are the three 2-sets plus {x, x}, {y,y} 
and {z, z}. 


5. How many ways can the set S be partitioned into a collection of k pairwise dis- 
joint nonempty smaller sets?? With k= 2, the set S ={z,y,z} has three such: 


{te}, {ys 2h} tte, yt, {2th and {{2, 2}, {yf}. 


We will learn how to answer these questions without going through the time-consuming 
process of listing all the items in question as we did for our illustration. 


How many ways can we construct a k-list (repeats allowed) using an n-set? Look at 
our illustration in Question 3 above. The first entry in the list could be x, y or z. After any 
of these there were three choices (x, y or z) for the second entry. Thus there are 3 x 3 = 9 
ways to construct such a list. The general pattern should be clear: There are n ways to 
choose each list entry. Thus 


Theorem 1 (k-lists with repetitions) There are n* ways to construct a k-list from 
an n-set. 


This calculation illustrates an important principle: 


Theorem 2 (Rule of Product) | Suppose structures are to be constructed by making 
a sequence of k choices such that, (1) the ith choice can be made in c; ways, a number 


2 Sample and selection are often used in probability and statistics, where it may mean 
a list or a multiset, depending on whether or not it is ordered. 
3 In other words, each element of S appears in exactly one of the smaller sets. 
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independent of what choices were made previously, and (2) each structure arises in exactly 
one way in this process. Then, the number of structures is cy X +--+ X Cr. 


“Structures” as used above can be thought of simply as elements of a set. We prefer 
the term structures because it emphasizes that the elements are built up in some way; in 
this case, by making a sequence of choices. In the previous calculation, the structures are 
k-lists, which are built up by adding one element at a time. Each element is chosen from 
a given n-set and cj] = cg =... =Chr=N. 


Definition 1 (Cartesian Product) If Cy,...,C, are sets, the Cartesian product of 
the sets is written Cy x --- x C, and consists of all k-lists (41,...,¢,%) with x; € C; for 
1<i<k. 


For example, {1,2} x {x} x {a,b,c} is a set containing the six lists lra, 1xb, lac, 2xa, 2xb 
and 2c. 


A special case of the Rule of Product is the fact that the number of elements in 


C, x «+» x Cy is the product |C,|---|C,|. Here C; is the collection of i*” choices and 
c; = |C;|. This is only a special case because the Rule of Product would allow the collection 
C;, to depend on the previous choices 21,...,2;—-1 as long as the number c; of possible 
choices does not depend on 71,...,2%j—1. 


Here is a property associated with Cartesian products that we will find useful in our 
later discussions. 


Definition 2 (Lexicographic order) IfC,,...,C, are lists of distinct elements, we may 
think of them as sets and form the Cartesian product P = C,x---xC,. The lexicographic 
order on P is defined by saying that (a1,...,a~) <x (b1,...,6%) if and only if there is some 
t < k such that a; = b; fori < t and a < b:. Usually we write (a1,...,a%) < (b1,..., 5%) 
instead of (a1,...,@~) <x (b1,...,0%), because it is clear from the context that we are 
talking about lexicographic order. 


Often we say lex order instead of lexicographic order. If all the C;’s equal 
(0,1, 2, 3, 4, 5, 6, 7, 8, 9) 


then lex order is simply numerical order of & digit integers with leading zeroes allowed. 
Suppose that all the C;’s equal (<space>,A,B,...,Z). If we throw out those elements of 
P that have a letter following a space, the result is dictionary order. For example, BAT, 
BATTERS and BATTLE are in lex order. Why? All agree in the first three positions. 
The fourth position of BAT is <space>, which precedes all letters in our order. Similarly, 
BATTERS comes before BATTLE because they first differ in the fifth position and E < L 
Unlike these two simple examples, the C;’s usually vary with 7. 
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Example 2 (A simple count) The North-South streets in Rectangle City are named 
using the numbers | through 12 and the East-West streets are named using the letters A 
through H. The most southwesterly intersection occurs where 1 and A streets meet. How 
many blocks are within the city? 


Each block can be labeled by the streets at its southwesterly corner. These labels have 
the form (x,y) where x is between 1 and 11 inclusive and y is between A and G. (If you 
don’t see why 12 and H are missing, draw a picture and look at southwesterly corners.) 
By the Rule of Product there are 11 x 7 = 77 blocks. In this case the structures can be 
taken to be the descriptions of the blocks. Each description has two parts: the names of 
the North-South and East-West streets at the block’s southwest corner. JJ 


Example 3 (Counting galactic names) In a certain land on a planet in a galaxy far 
away the alphabet contains only 5 letters which we will transliterate as A, I, L, S and T in 
that order. All names are 6 letters long, begin and end with consonants and contain two 
vowels which are not adjacent to each other. Adjacent consonants must be different. The 
list begins with LALALS, LALALT, LALASL, LALAST, LALATL, LALATS, LALILS and 
ends with TSITAT, TSITIL, TSITIS, TSITIT. How many possible names are there? 


The possible positions for the two vowels are (2,4), (2,5) and (3,5). Each of these 
results in two isolated consonants and two adjacent consonants. Thus the answer is the 
product of the following factors: 


1. choose the vowel locations (3 ways); 
2. choose the vowels (2 x 2 = 4 ways); 
3. choose the isolated consonants (3 x 3 = 9 ways); 
4. choose the adjacent consonants (3 x 2 = 6 ways). 


The answer is 3 x 4 x 9 x 6 = 648. This construction can be interpreted as a Cartesian 
product as follows. C, is the set of lists of possible positions for the vowels, C2 is the set 
of lists of vowels in those positions, and C3 and C4 are sets of lists of consonants. Thus 


C, = {(2,4), (2,5), (3,5)} Cz = {AA, ALIA, I} 


C3 = {LL,LS,LT,SL,SS,ST,TL,TS,TT} Cy, = {LS,LT,SL,ST,TL, TS}. 


For example, ((2,5), IA, SS, ST) in the Cartesian product corresponds to the name SIS- 
TAS. O 


Here’s another important principle, the proof of which is self evident: 


Theorem 3 (Rule of Sum) Suppose a set T of structures can be partitioned into sets 
T,,...,I; so that each structure in T appears in exactly one T;, then 


[Z| = [Ti] +--+ + [T3l. 
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Example 4 (Counting galactic names again) We redo the previous example using 
the Rule of Sums. The possible vowel (V) and consonant (C) patterns for names are 
CCVCVC, CVCCVC and CVCVCC. Since these patterns are disjoint and cover all cases, we 
may compute the number of names of each type and add the results together. For the first 
pattern we have a product of six factors, one for each choice of a letter: 3x 2x2x3x2x3= 
216. The other two patterns also give 216, for a total of 216 + 216 + 216 = 648 names. 


This approach has a wider range of applicability than the method we used in the previ- 
ous example. We were only able to avoid the Rule of Sum in the first method because each 
pattern contained the same number of vowels, isolated consonants and adjacent consonants. 
Here’s an example that requires the Rule of Sum. Suppose a name consists of only four 
letters, namely two vowels and two consonants, constructed so that the vowels are not adja- 
cent and, if the consonants are adjacent, then they are different. There are three patterns: 
CVCV, VCVC, VCCV. By the Rule of Product, the first two are each associated with 36 
names, but VCCV is associated with only 24 names because of the adjacent consonants. 
Hence, we cannot choose a pattern and then proceed to choose vowels and consonants. On 
the other hand, we can apply the Rule of Sum to get a total of 96 names. 0 


Example 5 (Smorgasbord College committees) Smorgasbord College has four de- 
partments which have 6, 35, 12 and 7 faculty members. The president wishes to form a 
faculty judicial committee to hear cases of student misbehavior. To avoid the possibility of 
ties, the committee will have three members. To avoid favoritism the committee members 
will be from different departments and the committee will change daily. If the committee 
only sits during the normal academic year (165 days), how many years can pass before a 
committee must be repeated? 


If T is the set of all possible committees, the answer is |T'|/165. Let T; be the set 
of committees with no members from the ith department. By the Rule of Sum |T| = 
\T,| + |T2| + |T3| + |T4|. By the Rule of Product 


|T,| = 35 x 12 x 7 = 2940 |T3| = 35 x 6 x 7 = 1470 
\T>| =6 x 12x 7 = 504 \Ty| = 35 x 12 x 6 = 2520. 


Thus the number of years is 7434/165 = 45+. Due to faculty turnover, a committee need 
never repeat — if the president’s policy lasts that long. OJ 


Whenever we encounter a new technique, there are two questions that arise: 
e When is it used? e How is it used? 
For the Rules of Sum and Product, the answers are intertwined: 


Suppose you wish to count the number of structures in a set and that you 
can describe how to construct the structures in terms of subconstructions 
that are connected by “ands” and “ors.” If this leads to the construction 
of each structure in a unique way, then the Rules of Sum and Product 
apply. To use them, replace “ands” by products and “ors” by sums. 
Whenever you write something like “Do A AND do B,” it should mean 
“Do A AND then do B” because the Rule of Product requires that the 
choices be made sequentially. Remember that the number of ways to do 
B must not depend on the choice for A. 
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Example 6 (Applying the sum—product rules) To see how this technique is applied, 
let’s look back at Example 5. A committee consists of either 


1. One person from Dept. 1 AND one person from Dept. 2 AND one person from Dept. 3, 


OR 


2. One person from Dept. 1 AND one person from Dept. 2 AND one person from Dept. 4, 


OR 


3. One person from Dept. 1 AND one person from Dept. 3 AND one person from Dept. 4, 


OR 


4. One person from Dept. 2 AND one person from Dept. 3 AND one person from Dept. 4. 


The number of ways to choose a person from a department equals the number of people in 
the department. OJ 


Until you become comfortable using the Rules of Sum and Product, look for “and” 
and “or” in what you do. This is an example of the useful tactic: 


Step 1: Break the problem into parts. 
Step 2: Work on each piece separately. 
Step 3: Put the pieces together. 


Here Step 1 is getting a phrasing with “ands” and “ors;” Step 2 is calculating each of the 
individual pieces; and Step 3 is applying the Rules of Sum and Product. 


Exercises for Section 1 


The following exercises will give you additional practice on lists with repetition and the 
Rules of Sum and Product. 


1.1. 


1.2. 


1.3. 


1.4. 


In each exercise, indicate how you are using the Rules of Sum and Product. 


Suppose a bookshelf contains five discrete math texts, two data structures texts, 
six calculus texts, and three Java texts. (All the texts are different.) 


(a) How many ways can you choose one of the texts? 


(b) How many ways can you choose one of each type of text? 


How many different three digit positive integers are there? (No leading zeroes are 
allowed.) How many positive integers with at most three digits? What are the 
answers when “three” is replaced by “n?” 


Prove that the number of subsets of a set S', including the empty set and S itself, 
is 2/51, 


Suppose n > 1. An n-digit number is a list of n digits where the first digit in the 
list is not zero. 
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1.5. 


1.6. 


1.7. 


1.8. 


1.9. 


(a) How many n-digit numbers are there? 
(b) How many n-digit numbers contain no zeroes? 


(c) How many n-digit numbers contain at least one zero? 
Hint: Use (a) and (b). 


For this exercise, we work with the ordinary alphabet of 26-letters. 


(a) Define a “4-letter word” to be any list of 4 letters that contains at least one of 
the vowels A, E, I, O and U. How many 4-letter words are there? 


(b) Suppose, instead, we define a “4-letter word” to be any list of 4 letters that 
contains exactly one of the vowels A, E, 1, O and U. How many 4-letter words 
are there? 


In a certain land on a planet in a galaxy far away the alphabet contains only 
5 letters which we will transliterate as A, I, L, S and T in that order. All names 
are 5 letters long, begin and end with consonants and contain two vowels which 
are not adjacent to each other. Adjacent consonants must be different. How many 
names are there? 


A composition of a positive integer n is a list of positive integers (called parts) 
that sum to n. The four compositions of 3 are 3; 2,1; 1,2 and 1,1,1. 


(a) By considering ways to insert plus signs and commas in a list of n ones, obtain 
the formula 2”~! for the number of compositions of n. To avoid confusion with 
the Rule of Sum, we’ll write this plus sign as @. (The four compositions 3; 2,1; 
1,2 and 1,1,1 correspond to 161461; 161,1; 1,161 and 1,1,1, respectively.) 


(b) List all compositions of 4. 
(c) List all compositions of 5 with 3 parts. 


In Example 3 we found that there were 648 possible names. Suppose that these are 
listed in the usual dictionary order. The last word in the first third of the dictionary 
is LTITIT (the 216'* word). The first word in the middle third is SALALS. Explain. 


There is another possible lexicographic order on the names in Example 3 (Counting 
galactic names) that gives rise to a “nonstandard” lex order on this list of names. 
Using the interpretation of the list of names as the Cartesian product of the lists 
C1 x Cg x C3 x C4, we can lexicographically order the entire list of names based on 
the following linear orderings of the C;, i = 1, 2,3, 4: 


C, = ((2,4), (2,5), (3,5) Cz = (AA, ALIA, ID) 


C3 = (LL,LS,LT,SL,SS,ST,TL,TS,TT) Cy, = (LS,LT,SL,ST,TL, TS). 


What are the first seven and last seven entries in this lex ordering? 
Hint: The lex ordering can be done entirely in terms of the sets C; and then 
translated to the names as needed. Thus the first two entries in the list C, x 
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C2 x C3 x C4 in lex order are (2,4)(AA)(LL)(LS) and (2,4)(AA)(LL)(LT). The last 
two are (3,5)(II)(TT)(TL) and (3,5)(II)(TT)(TS). These translate to LALALS and 
LALALT for the first two and TLITIT and TSITIT for the last two. 


1.10. Recall that the size of a multiset is the number of elements it contains. For example, 
the size of {a,a,b} is three. 


(a) How many 4-element multisets are there whose elements are taken from the set 
{a,b,c}? (An element may be taken more than once; for example, the multiset 


{6;6;3,.¢}5) 


(b) How many multisets are there whose elements are taken from the set {a,b,c}? 


Section 2: Lists Without Repetition 


What happens if we do not allow repeats in our list? Suppose we have n elements to choose 
from and wish to form a k-list with no repeats. How many lists are there? We can choose 
the first entry in the list AND choose the second entry AND --- AND choose the kth entry. 
There are n — (i 1) = n—i+1 ways to choose the ith entry since i — 1 elements have 
been removed from the set to make the first part of the list. By the Rule of Product, the 
number of lists is n(n — 1)---(n-—k+1). Using the notation n! for the product of the first 
n integers and writing 0! = 1, you should be able to see that this answer can be written as 
n!/(n—k)!, which is often designated by (n), and called the falling factorial. Some authors 


write the falling factorial as n#. We have proven 


Theorem 4 (k-lists without repetition) | When repeats are not allowed, there are 
n!/(n — k)! = (n), k-lists that can be constructed from an n-set. (When k > n the answer 
is zero.) 


When & = n, a list without repeats is simply a linear ordering of the set. We frequently say 
“ordering” instead of “linear ordering.” An ordering is sometimes called a “permutation” 
of S. Thus, we have proven that a set S can be (linearly) ordered in ||! ways. 


Example 7 (Lists without repeats) How many lists without repeats can be formed 
from a 5-set? There are 5! = 120 5-lists without repeats, 5!/1! = 120 4-lists without repeats, 
5!/2! = 60 3-lists, 5!/3! = 20 2-lists and 5!/4! = 5 L-lists. By the Rule of Sum, this gives a 
total of 325 lists, or 326 if we count the empty list. 0 
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Example 8 (Linear arrangements) How many different ways can 100 people be ar- 
ranged in the seats in a classroom that has exactly 100 seats? Each seating is simply an 
ordering of the people. Thus the answer is 100!. Simply writing 100! probably gives you 
little idea of the size of the number of seatings. A useful approximation for factorials is 
given by Stirling’s formula. 


Theorem 5 (Stirling’s formula) V2mn(n/e)” approximates n! with a relative error 
less than 1/10n. 


We say that f(x) approximates g(x) with a relative error of |f(x)/g(x)—1|. Thus, the 
theorem states that V27n (n/e)”/n! differs from 1 by less than 1/10n. When relative error 
is multiplied by 100, we obtain “percentage error.” By Stirling’s formula, we find that 100! 
is nearly 9.32 x 101°", which is much larger than estimates of the number of atoms in the 
universe. [JJ 


We can extend the ideas of the previous example. Suppose we still have 100 seats but 
have only 95 people. We need to think a bit more carefully than before. One approach is 
to put the people in some order, select a list of 95 seats, and then pair up people and seats 
so that the first person gets the first seat, the second person the second seat, and so on. By 
the general formula for lists without repetition, the answer is 100!/(100 — 95)! = 100!/120. 
We can also solve this problem by thinking of the people as positions in a list and the seats 
as entries! Thus we want to form a 95-list using the 100 seats. According to Theorem 4, 
this can be done in 100!/(100 — 95)! ways. 


Lists can appear in many guises. As seen in the previous paragraph, the people could 
be thought of as the positions in a list and the seats the things in the list. Sometimes it 
helps to find a reinterpretation like this for a problem. At other times it is easier to tackle 
the problem starting over again from scratch. These methods can lead to several approaches 
to a problem. That can make the difference between a solution and no solution or between 
a simple solution and a complicated one. You should practice using both methods, even on 
the same problem. 


Example 9 (Circular arrangements) How many ways can n people be seated on a 
Ferris wheel with exactly one person in each seat? Equivalently, we can think of this as 
seating the people at a circular table with n chairs. Two seatings are defined to be “the 
same” if one can be obtained from the other by rotating the Ferris wheel (or rotating the 
seats around the table). 


If the people were seated in a straight line instead of in a circle, the answer would 
be n!. Can we convert the circular seating into a linear seating (i-e., a list)? In other 
words, can we convert the unsolved problem to a solved one? Certainly — simply cut the 
circular arrangement between two people and unroll it. Thus, to arrange n people in a 
linear ordering, 

first arrange them inacircle AND then cut the circle. 


According to our AND/OR technique, we must prove that each linear arrangement arises 
in exactly one way with this process. 
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e Since a linear seating can be rolled up into a circular seating, it can also be obtained 
by unrolling that circular seating. Hence each linear seating arises at least once. 


e Since the people at the circular table are all different, the place we cut the circle 
determines who the first person in the linear seating is, so each cutting of a circular 
seating gives a different linear seating. Obviously two different circular seatings cannot 
give the same linear seating. Hence each linear seating arises at most once. 


Putting these two observations together, we see that each linear seating arises exactly once. 
By the Rule of Product, 


n! = (number of circular arrangements) x n. 


Hence the number of circular arrangements is n!/n = (n — 1)!. 


Our argument was somewhat indirect. We can derive the result by a more direct 
argument. For convenience, let the people be called 1 through n. We can read off the 
people in the circular list starting with person 1. This gives a linear ordering of {1,...,n} 
that starts with 1. Conversely, each such linear ordering gives rise to a circular ordering. 
Thus the number of circular orderings equals the number of such linear orderings. Having 
listed person 1, there are (n — 1)! ways to list the remaining n — 1 people. 


If we are making circular necklaces using n distinct beads, then the arguments we have 
just given prove that there are (n — 1)! possible necklaces provided we are not allowed to 
flip necklaces over. 


What happens if the beads are not distinct? For example, suppose there are three 
blue beads and three yellow beads. There are just two linear arrangements associated with 
the circular arrangement BYBYBY, namely (B,Y,B,Y,B,Y) and (Y,B,Y,B,Y,B). But there 
are six linear arrangements associated with the circular arrangement BBBYYY. Thus, the 
approach we used for distinct beads fails, because the number of lists associated with a 
necklace depends on the necklace. For now, you only need to be aware of this complica- 
tion. 0 


We need not insist on “no repetitions at all” in lists. There are natural situations 
in which some repetitions are allowed and others are not allowed. The following example 
illustrates one such way that this can happen. 


Example 10 (Words from a collection of letters — first try) How many “words” of 
length k can be formed from the letters in ERROR when no letter may be used more often 
than it appears in ERROR? (A “word” is any list of letters, pronounceable or not.) You 
can imagine that you have 5 tiles, namely one E, one O, and three R’s. The answer is not 
3” even though we are using 3 different letters. Why is this? Unlimited repetition is not 
allowed so, for example, we cannot have EEE. On the other hand, the answer is not (3), 
since R can be repeated some. Also, the answer is not (5), even though we have 5 tiles. 
Why is this? The formula (5), arises if we have 5 distinct objects; however, our 3 tiles with 
R are identical. At present, all we can do is carefully list the possibilities. Here they are in 
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alphabetical order. 

k=1: B,O,R 

k =2: EO, ER, OB, OR, RE, RO, RR 

k =3: EOR, ERO, ERR, OER, ORE, ORR, REO, RER, ROE, ROR, RRE, RRO, RRR 

k =4: EORR, EROR, ERRO, ERRR, OBRR, ORER, ORRE, ORRR, REOR, RERO, 
RERR, ROER, RORE, RORR, RREO, RRER, RROE, RROR, RRRE, RRRO 

k =5: EORRR, ERORR, ERROR, ERRRO, OERRR, ORERR, ORRER, ORRRE, 
REORR, REROR, RERRO, ROERR, RORER, RORRE, RREOR, RRERO, 


RROER, RRORE, RRREO, RRROE 


This is obviously a tedious process. We shall return to this type of problem in the next 
section. 0 


Exercises for Section 2 


The following exercises will give you additional practice with lists with restricted repe- 
titions. 


In each exercise, indicate how you are using the Rules of Sum and Product. 


It is instructive to first do these exercises using only the techniques introduced so far 
and then, after reading the next section, to return to these exercises and look for other 
ways of doing them. 


2.1. We want to know how many ways 3 boys and 4 girls can sit in a row. 
(a) How many ways can this be done if there are no restrictions? 


(b) How many ways can this be done if the boys sit together and the girls sit 
together? 


(c) How many ways can this be done if the boys and girls must alternate? 
2.2. Repeat the previous exercise when there are 3 boys and 3 girls. 
2.3. What are the answers to the previous two exercises if the table is circular? 


2.4. How many ways are there to form a list of two distinct letters from the set of letters 
in the word COMBINATORICS? three distinct letters? four distinct letters? 


2.5. How many ways are there to form a list of two letters from the set of letters in 
the word COMBINATORICS if the letters cannot be used more often than they 
appear in COMBINATORICS? three letters? 


2.6. We are interested in forming 3 letter words (“3-words”) using the letters in LIT- 
TLEST. For the purposes of the problem, a “word” is any list of letters. 
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(a) How many words can be made with no repeated letters? 
ow many words can be made with unlimited repetition allowed? 
b) H d b de with unlimited tition allowed? 


(c) How many words can be made if repeats are allowed but no letter can be used 
more often than it appears in LITTLEST? 


2.7. By 2050 spelling has deteriorated considerably. The dictionary defines the spelling 
of “relief” to be any combination (with repetition allowed) of the letters R, L, F, I 
and E subject to certain constraints: 


e The number of letters must not exceed 6. 
e The word must contain at least one L. 
e The word must begin with an R and end with an F. 
e There is just one R and one F. 
(a) How many spellings are possible? 


(b) The most popular spelling is the one that, in dictionary order, is five before 
the spelling RELIEF. What is it? 


*2.8. By the year 2075, further deterioration in spelling has occurred. The dictionary 
now defines the spelling of “relief” to be any combination (with repetition allowed) 
of the letters R, L, F, I and E subject to these constraints: 


e The number of letters must not exceed 6. 
e The word must contain at least one L. 


e The word must begin with a nonempty string of R’s and end with a nonempty 
string of F’s, and there are no other R’s and F’s. 


(a) How many spellings are possible? 


(b) The most popular spelling is the one that, in dictionary order, is five before 
the spelling RELIEF. What is it? 


*2.9. Prove that the number of lists without repeats that can be constructed from an 
n-set is very nearly n!e. Your count should include lists of all lengths from 0 to n. 
Hint: Recall that from Taylor’s Theorem in calculus e” = 14+a+27/2!+23/3!+---. 


Section 3: Sets 


We first review some standard terminology and notation associated with sets. When we 
discuss sets, we usually have a “universal set” U in mind, and the sets we discuss are subsets 
of U. For example, U = Z might be the integers. We then speak of the natural numbers 
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N = {0,1,2,...}, the positive integers N*, the odd integers N,, etc., thinking of these sets 
as subsets of the “universal set” Z. 


Definition 3 (Set notation) A set is an unordered collection of distinct objects. We 
use the notation x € S to mean “xr is an element of S” and x ¢ S to mean “x is not an 
element of S.” Given two subsets (subcollections) of U, X and Y, we say “X is a subset 
of Y,” written X CY, ifx € X implies that x € Y. Alternatively, we may say that “Y is 
a superset of X.” X C Y and Y D X mean the same thing. We say that two subsets X 
and Y of U are equal if X C Y and Y C X. We use braces to designate sets when we wish 
to specify or describe them in terms of their elements: A = {a,b,c}, B = {2,4,6,...}$. A 
set with k elements is called a k-set or set with cardinality k. |The cardinality of a set A 
is denoted by |A|. 


Since a set is an unordered collection of distinct objects, the following all describe the 
same 3-element set 


{a,b,c} = {b, a,c} = {c, b,a} = {a, b, b,c, b}. 


The first three are simply listing the elements in a different order. The last happens to 
mention some elements more than once. But, since a set consists of distinct objects, the 
elements of the set are still just a, b, c. Another way to think of this is: 


Two sets A and B are equal if and only if every element of A is an 
element of B and every element of B is an element of A. 


Thus, with A = {a,b,c} and B = {a,b,b,c,b}, we can see that everything in A is in B and 
everything in B is in A. You might think “When we write a set, the elements are in the 
order written, so why do you say a set is not ordered?” When we write something down 
we’re stuck — we have to list them in some order. You can think of a set differently: Write 
each element on a separate slip of paper and put the slips in a paper bag. No matter how 
you shake the bag, it’s still the same set. 


For the most part, we shall be dealing with finite sets. Let U be a set and let A and 
B be subsets of U. 


e The sets AN B and AUB are the intersection and union of A and B. 


e The set A\ B={x:x2€ A,x ¢ B} is the set difference of A and B. It is also written 
A-B. 


e The set U \ A or AS is the complement of A (relative to U). The complement of A is 
also written A’ and ~A. 


e The set A® B= (A\ B)U(B\ A) is the symmetric difference of A and B. 


e The set Ax B= {(x,y): x2 € A,y © B} is the product or Cartesian product of A and 
B. 
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Example 11 (Cardinality of various sets) Recall that |.S|, the cardinality of the set 
S' is its size; that is, the number of elements in the set. 


By the Rule of Product, |A x B| = |A| x |B|. (The first multiplication is Cartesian 
product; the second is multiplication of numbers.) Also, by the Rule of Product, the number 
of subsets of A is 2!4!. To see this, notice that for each element of A we have two choices 
— include the element in the subset or not include it. 


What about things like | AUB| and |A®B|? They can’t be expressed just in terms of | A| 
and |B]. To see this, note that if A = B, then |AU B| = |A| and |A@ B| = |6| = 0. On the 
other hand, if A and B have no common elements, |AUB| = |A|+|B| and |A®B]| = |A|+|B| 
as well. Can we say anything in general? Yes. We'll return to this later. 


The algebraic rules for operating with sets are also familiar to most beginning university 
students. Here is such a list of the basic rules. In each case the standard name of the rule 
is given first, followed by the rule as applied first to M and then to U. 


Theorem 6 (Algebraic rules for sets) The universal set U is not mentioned explicitly 
but is implicit when we use the notation ~X = U — X for the complement of X. An 
alternative notation is X° = ~X. 


Associative: (PROOR=PA(QOnR) (PUQ)UR= PU(QUR) 
Distributive: PRQUR) ](PrOyuPenek) PUGH A= (PUG UR) 
Idempotent: POP=P Pj PSP 

Double Negation: ~~P = P 

DeMorgan: ~(PNQ)=~PUnQ ~(PUQ)=~PN~Q 
Absorption: PLP Oy =P POPU) = 7 

Commutative: PNQ=QNP PUQ=QUP 


These rules are “algebraic” rules for working with NM, U, and ~. You should memorize them 
as you use them. They are used just like rules in ordinary algebra: whenever you see an 
expression on one side of the equal sign, you can replace it by the expression on the other 
side. 


We use the notation P(A) to denote the set of all subsets of A and P;,(A) the set of all 
subsets of A of size (or cardinality) k. (In the previous example, we saw that {P| = 2/41.) 
Let C(n,k) = |P;,(A)| denote the number of different k-subsets that can be formed from an 
n-set. The notation (9) is also frequently used. These are called binomial coefficients 


and are read “n choose k.” How do we compute C(n, k)? 


Can we rephrase the problem in a way that converts it to a list problem, since we 
know how to solve those? In other words, can we relate this problem, where order does not 
matter, to a problem where order matters? 


Let’s consider all possible orderings of each of our k-sets. This gives us a way to 
construct all lists with distinct elements in two steps: First construct a k-set, then order 
it.4 We can order a k-set by forming a k-list without repeats from the k-set. By Theorem 4 


4 We used an idea like this in Example 9 when we counted circular lists with distinct 
elements. 
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of Section 2, we know that this can be done in k! ways. By the Rule of Product, there 
are C(n,k) k! distinct k-lists with no repeats. By Theorem 4 again, this number is n(n — 
1)---(n-—k+1) =n!/(n—k)!. Dividing by k!, we have 


Theorem 7 (Binomial coefficient formula) The value of the binomial coefficient is 


n n(n—1)---(n-—k+1) n! 
(7) Se kl ~ k(n—k)E 


Furthermore (7) = (Grae 


Example 12 (Computing binomial coefficients) Let’s compute some binomial coef- 


ficients for practice. 
7 7 
~~ EX 6x5 — 35, 
3 3! 


because n = 7,k =3 andson—k+1=5. Alternatively, 


7 7  1x2x3x4x5x6x7 
By Sl aly” ix Oe BC kK ax 8K A)? 
which again gives 35 after some work. 

How about computing era Using the formula Bap-@) involves a lot of writing and 
then a lot of cancellation (there are common factors in the numerator and denominator). 
There is a quicker way. By the last sentence in the theorem, (Ga) = ae Now we have 


10 2 
() ae ext — 66. oO 


*Example 13 (A generating function for binomial coefficients) We’ll now approach 
the problem of evaluating C(n, k) in another way. In other words, we’ll “forget” the formula 
we just derived and start over with a new approach. You may ask “Why waste time using 
another approach when we’ve already gotten what we want?” We gave a partial answer to 
this earlier. Here is a more complete response. 


e By looking at a problem from different viewpoints, we may come to understand it 
better and so be more comfortable working similar problems in the future. 


e By looking at a problem from different viewpoints, we may discover that things we 
previously thought were unrelated have interesting connections. These connections 
might open up easier ways to solve some types of problems and may make it possible 
for us to solve problems we couldn’t do before. 


e A different point of view may lead us to a whole new approach to problems, putting 
powerful new tools at our disposal. 


In the approach we are about to take, we’ll begin to see a powerful tool for solving 
counting problems. It’s called “generating functions” and it lets us put calculus and related 
subjects to work in combinatorics. 


Suppose that S = {21,...,%n} where 21, v2, ... and x, are variables as in high school 
algebra. Let P(S) = (1+21)--:(1+2,). The first three values of P(S) are 
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m=1: 1+4+% 
(ae l+a,+%2.4+%1%2 
m=3: l+ay+a9+%34+ %1%24+ £1934 £2%3 4+ £12243. 


From this you should be able to convince yourself that P(S) consists of a sum of terms 
where each term represents one of the subsets of S as a product of its elements. Can we 
reach some understanding of why this is so? Yes, but we’ll only explore it briefly now. The 
understanding relates to the Rules of Sum and Product. Interpret plus as OR, times as AND 
and 1 as “nothing.” Then (1+ 21)(1+22)(1+ 2x3) can be read as 


e include the factor 1 in the term OR include the factor 7; AND 
e include the factor 1 in the term OR include the factor x2 AND 
e include the factor 1 in the term OR include the factor x3. 

In other words 
e omit x; OR include 7; AND 
e omit x2 OR include 72 AND 
e omit x3 OR include 23. 


This is simply a description of how to form an arbitrary subset of {21,272,273}. On the 
other hand we can form an arbitrary subset by the rule 


e include nothing in the subset OR 

e include x, in the subset OR 

e include x2 in the subset OR 

e include x3 in the subset OR 

e include x; AND zo in the subset OR 

e include x; AND v3 in the subset OR 

e include x2 AND zz in the subset OR 

e include x; AND x2 AND 23 in the subset. 


If we drop the subscripts on the x;’s, then a product representing a k-subset becomes 2”. 


We get one such term for each subset and so it follows that the coefficient of x” in the 
polynomial f(z) = (1+ x)” is C(n,k); that is, 


Cha)? = SO(n, ka". 
k=0 


This expression is called a generating function for the binomial coefficients C(n, k). 


Can this help us evaluate C(n,k)? Calculus comes to the rescue through Taylor’s 
Theorem! Taylor’s Theorem tells us that the coefficient of x” in f(x) is f(0)/k!. Let 
f(x) = (14+ 2)”. Taking the k-th derivative of f gives 

f©)(@) =n(n-1)---(m—k+1) (1+2)"*. 
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Thus C(n,k), the coefficient of «* in (1+ 2)", is 


f®(0)  n(n—1)---(n—k+1) 


k! k! 


C(n,k) = 
We conclude this example with 


Theorem 8 (Binomial Theorem) 


This follows from the identity (1+-2)” = 77, C(n,k)ax*: Since (x+y)” = 2"(1+(y/z))”, 
the coefficient of x" (y/x)* in (x + y)” is C(n,k). O 


To illustrate, (2 + y)? = (ary? + (s)a7y? + G)aty? + (jaeg, which equals x? + 
3a7y + 382y? + y?. 


Example 14 (Smorgasbord College programs) Smorgasbord College allows students 
to study in three principal areas: (a) Swiss naval history, (b) elementary theory and (c) com- 
puter science. The number of upper division courses offered in these fields are 2, 92, and 15 
respectively. To graduate, a student must choose a major and take 6 upper division courses 
in it, and also choose a minor and take 2 upper division courses in it. Swiss naval history 
cannot be a major because only 2 upper division courses are offered in it. How many 
programs are possible? 


The possible major-minor pairs are b-a, b-c, c-a, and c-b. By the Rule of Sum we 
can simply add up the number of programs in each combination. Those programs can be 
found by the Rule of Product. The number of major programs in (b) is C(92,6) and in 
(c) is C(15,6). For minor programs: (a) is C(2,2) =1, (b) is C(92,2) = 4186 and (c) is 
C(15,2) = 105. Since the possible programs are constructed by 


(major (b) AND (minor (a) OR minor (0)) 
OR (major (c) AND (minor (a) OR minor (>))). 

the number of possible programs is 
& (1+ 105) + e (1+ 4186) = 75,606,201,671, 


a rather large number. OJ 
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Example 15 (Card hands: Full house) Card hands provide a source of some simple 
sounding but tricky set counting problems. A standard deck of cards contains 52 cards, 
each of which is marked with two labels. The first label, called the “suit,” belongs to the 
set 


suits = {&,9,0, a}, 


called club, heart, diamond and spade, respectively. (On the blackboard, we will use C, H, 
D and S rather than drawing the symbols.) The second label, called the “value” belongs 
to the set 

values = {2,3, 4, 5, 6, 7,8, 9,10, J, Q,K, A}, 


where J, Q, K and A are jack, queen, king and ace, respectively. Each pair of labels occurs 
exactly once in the deck. A hand is a subset of a deck. Two cards are a pair if they have 
the same values. 


How many 5 card hands consist of a pair and a triple? (In poker, such a hand is called 
a “full house.” ) 


To calculate this we describe how to construct such a hand: 
e Choose the value for the pair AND 
e Choose the value for the triple different from the pair AND 
e Choose the 2 suits for the pair AND 
e Choose the 3 suits for the triple. 
This produces each full house exactly once, so the number is the product of the answers 


for the four steps, namely 


13°12: % C(4,2) <C(4,3) = 3,744. oO 


Example 16 (Card hands: Two pairs) We’ll continue with our poker hands. How 
many 5 card hands consist of two pairs? A description of a hand always means that there 
is nothing better in the hand, so “two pairs” means we don’t have a full house or four of a 
kind. 


The obvious thing to do is replace “triple” by “second pair” in the description for 
constructing a full house and add a choice for the card that belongs to no pair. This is not 
correct! Each hand is constructed twice, depending on which pair is the “second pair.” Try 
it! What happened? Before choosing the cards for a pair and a triple, we can distinguish a 
pair from a triple because a pair contains 2 cards and a triple 3. We can’t distinguish the 
two pairs, though, until the values are specified. This is an example of a situation where 
we can easily make mistakes if we forget that “AND” means “AND then.” Here’s a correct 
description, with “then” put in for emphasis. 


e Choose the values for the two pairs AND then 

e Choose the 2 suits for the pair with the larger value AND then 

e Choose the 2 suits for the pair with the smaller value AND then 

e Choose the remaining card from the 4 x 11 cards that have different values from 


the pairs. 
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e x (3) x (3) x 44 = 123,552. oO 


Example 17 (Rearranging MISSISSIPPI) We are going to count the ways to “re- 
arrange” the letters in the word MISSISSIPPI. Before “rearranging” them, we should be 
precise about what we mean by “arranging” them. The distinct letters in the word MIS- 
SISSIPPI are I, M, P, and S. There are eleven letter positions in the word MISSISSIPPI 
which we can explicitly label as follows: 


1 2 3 4 5 6 7 8 9 10 11 
MtIos5 18 S IP P I 


We can describe this placement of letters by a rule such as 


Te {2,5,8,11}, Me {1}, Pe {9,10}, and S¢ {3,4,6,7} 


The answer is 


If we remember the ordering (alphabetic in this case), I, M, P, S, then we can specify this 
arrangement by the ordered partition 


({2,5,8, 11}, {1}, {9, 10}, {3, 4, 6, 7}) 


of the set {1,2,...,11}.° We say that this ordered partition is of type (4,1,2,4), referring 
to the sizes of the sets, in order, that make up the ordered partition. Each of these 
sets is called a block or, in statistics, a cell. In general, an ordered partition of a set T' of 
type (m1, ™m2,..., Mr) is a sequence of disjoint sets (By, Bo,..., By) such that |B;| = mj, 
i= 1,2,...,k, and U*_,B; = T. Empty sets are allowed in ordered partitions. The set 
of all rearrangements of the letters in the word MISSISSIPPI corresponds to the set of all 
ordered partitions (B,, Bo, B3, By) of {1,2,...,11} of type (4,1,2,4). For example, the 
ordered partition ({1,5,7,10}, {2}, {9,11}, {3, 4,6, 8}) corresponds to the placement 


I< {1,5,7,10}, M+ {2}, Pe {9,11}, and S + {3,4,6, 8} 


and leads to the “word” 
Te De» cae. “Ay ho Be Se SS OP LO: aL 
TM Sos) oT Se TS PE SP 
Another, somewhat picturesque, way of describing ordered partitions of a set T is to think 
of ordered (i.e., labeled) boxes (B,, Bz,..., By) into which we distribute the elements of 
T, m; elements to box B;, i = 1,...,k. The next example takes that point of view and 
concludes that the number of such distributions of elements into boxes (i.e., the number of 
ordered partitions) is the multinomial coefficient 


( : ) T 
M1,™M2,...,Mz my! me!+++myz! 


As a result, the number of rearrangements of the word MISSISSIPPI is the multinomial 


coefficient ; 
11 11! 
= —— = 34,650. 

teow) A! 1! 2! 4! : O 


> Note the use of (...) and {...} here: We have a list, indicated by (...). Each element 
of the list is a set, indicated by {...}. 
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Example 18 (Multinomial coefficients) Suppose we are given k boxes labeled 1 
through k and an n-set S and we are told to distribute the elements of S among the 
boxes so that the ith box contains exactly m; elements. How many ways can this be done? 


Let n = |S|. Unless m; +...+ mz =n, the answer is zero because we don’t have the 
right number of objects. Therefore, we assume from now on that 


My+...-+Mp=N. 


Here’s a way to describe filling the boxes. 
e Fill the first box (There are C(n,m1) ways.°) AND 
e Fill the second box (There are C(n — m1,mz2) ways.) AND 
e e e 
e Fill the Ath box. (There are C(n — (mi +...+™mg_1), Mz) = C(mx, ME) = 1 ways.) 


Now apply the Rule of Product, use the formula C(p,q) = p!/q! (p — q)! everywhere, and 
cancel common factors in numerator and denominator to obtain n!/m,!mg!---m,!. To 


illustrate 
12\ (12-4) /12-4-3\_ 12! 8! 5! 12! 
4 3 3 ~ ALS! 315! 312! 4'3h3al’ 


which we write ( . In general, this expression is written 


43.3.2) 


( : ) 7 
m™m1,M9,...,Mz my! mg!+++my,! 


where n = my +mo2+...+ mg, and is called a multinomial coefficient. In multinomial 
notation, the binomial coefficient (7) would be written ( k is x): You can think of the first 
box as the k things that are chosen and the second box as the n — & things that are not 
chosen. 


As in the previous example (Example 17), we can think of the correspondence 


objects being distributed <=> positions in a word 
boxes <=> letters. 


If the object “position 3” is placed in the box “D,” then the letter D appears as the third 
letter in the word. The multinomial coefficient is then the number of words that can be 
made so that letter 7 appears exactly m; times. A word can be thought of as a list of its 
letters. O 


© Since m, things went into the first box, we have only n — m, left, from which we must 
choose mz for the second box. 
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Example 19 (Distributing toys) Eleven toys are to be distributed among 4 children. 
How many ways can this be done if the oldest child is to receive only 2 toys and each of 
the other children is to receive 3 toys? 


We can do this directly if we are used to thinking in terms of multinomial coefficients. 
We could also do it by converting the problem into one of our previous interpretations. 


Here is the first: We want an ordered partition of 11 toys into 4 piles (“blocks”) 
such that the first pile (for the oldest child) contains 2 and each of the 3 remaining piles 
contain 3 toys. This is an ordered partition of type (2,3,3,3). The number of them is 
(aaa) = 92, 400. 


Here is the second: Think of each child as a box into which we place toys. The number 
of ways to fill the boxes is, again, (, as ean 6 


Example 20 (Words from a collection of letters — second try) Using the idea at 
the end of the previous example, we can more easily count the words that can be made 
from ERROR, a problem discussed in Example 10. Suppose we want to make words of 
length k. Let m, be the number of E’s, mz the number of O’s and m3 the number of R’s. 
By considering all possible cases for the number of each letter, you should be able to see 
that the answer is the sum of sere over all m1,™m2,m3 such that 


m+tm+ms=k, O<m<1, O<m<1, O< m3 <3. 


Thus we obtain 


rau hi) Gehd) O80)” 
0,0,1 0,1,0 1,0,0 
2 2 
iin ses) Fle) toa) eae) 
a (eal lire) a oa 
4 4 4 
ie (33) * (os) tht) =” 
5 


This is better than in Example 10. Instead of having to list words, we have to list triples 
of numbers and each triple generally corresponds to more than one word. Here are the lists 
of triples for the preceding computations 


k= (0,0,1) (0,1,0) (1,0,0) 

k=2: (0,0,2) (0,1,1) (1,0,1) (1,1,0) 

=3: (0,0,3) (0,1,2) (1,0,2) (1,1,1) oO 
k=4: (0,1,3) (1,0,3) (1,1,2) 

k=5: (1,1,3) 
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Example 21 (Forming teams) How many ways can we form 4 teams from 12 people 
so that each team has 3 members? This is another multinomial coefficient (ordered set 


partition) problem and the answer is ( ie =) = 369,600 


Wait! We forgot to tell you that the teams don’t have names or any other distinguishing 
features except who the team members are. The solution that gave 554,400 created a list 
of teams, so there was a Team 1, Team 2, Team 3 and Team 4. We can deal with this the 
same way we got the formula for counting subsets: To form a list of 4 teams, first form 
a set and then order it. Since 4 distinct things can be ordered in 4! = 24 ways, we have 

369,600 = 24x” where x is our answer. 


If we told you in the first place that the teams were not ordered, you may not have 
thought of multinomial coefficients. This leads to two points. 


e It may be helpful to impose order and then divide it out. 


e We have found a way to count unordered partitions when all the blocks are the same 
size. This can be extended to the general case of blocks of various sizes but we will 
not do so. 


Wait! We forgot to tell you that we are going to form 4 teams, pair them up to play 
each other in a contest, say the team with Alice plays the team with Bob, and the other two 
teams play each other. The winners then play each other. Now we have to form the teams 
and divide them into pairs that play each other. Let’s do that. Suppose we have formed 
4 unordered teams. Now we must pair them off. This is another unordered partition: The 
four teams must be partitioned into two blocks each of size 2. From what we learned in 
the previous paragraph, we compute ( @ ) and divide by 2!, obtaining 3. Thus the answer 


2,2 
is 15,400x3= 46200. O 


Example 22 (Card hands and multinomial coefficients) To form a full house, we 
must choose a face value for the triple, choose a face value for the pair, and leave eleven 


face values unused. This can be done in (G oa) ways. We then choose the suits for the 


triple in (G) ways and the suits for the pair in (5) ways. Note that we choose suits only for 
the cards in the hand, not for the “unused face values.” 


To form two pair, we must choose two face values for the pairs, choose a face value for 
the single card, and leave ten face values unused. This can be done in ( ca) ways. We 


then choose suits for each of the face values in turn, so we must multiply by i) (5) eye 


Let’s imagine an eleven card hand containing two triples, a pair and three single cards. 
You should be able to see that the number of ways to do this is 


(red )QQQOMGO): a 


We conclude this section with an introduction to recursions. Let’s explore yet another 
approach to evaluating the binomial coefficient C(n,k) = (7). Let S$ = {x1,...,@n}. We'll 
think of C(n,k) as counting k-subsets of S. Either the element x,, is in our subset or it is 
not. The cases where it is in the subset are all formed by taking the various (k — 1)-subsets 
of S — {x,} and adding x, to them. The cases where it is not in the subset are all formed 
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by taking the various k-subsets of S — {x,}. What we’ve done is describe how to build 
k-subsets of S' from certain subsets of S — {x,,}. Since this gives each subset exactly once, 


n\ n—-1 ds n—-1 
k} \k-1 k 
by the Rule of Sum. 


The equation C(n,k) = C(n—1,k —1)+ C(n-—1,k) is called a recursion because it 
tells how to compute C(n,k) from values of the function with smaller arguments. This is 
a common approach which we can state in general form as follows. 


Example 23 (Deriving recursions) To count things, you might ask and answer the 
question 


How can I construct the things I want to count of a given size by using 
the same type of things of a smaller size? 
This process usually gives rise to a recursion. 
Actually, we’ve cheated a bit in all of this because the recursion only works when we 
have some values to start with. The correct statement of the recursion is either 
C(0,0) =1, 
C(0,k)=0 fork #0 and 
C(n,k) =C(n-—1,k -—1)+C(n—-1,k) for n> 0; 


Cia =Ci= i 
C(1,k)=0 fork #0,1 and 
C(n,k) =C(n-—1,k -—1)+C(n—-1,k) forn> 1; 


depending on how we want to start the computations based on this recursion. Below we 
have made a table of values for C(n,k). Sometimes this tabular representation of C'(n, k) 
is called “Pascal’s Triangle.” 


Sometimes it is easier to think in terms of “breaking down” rather than “constructing.” 
That is, ask the question 
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How can I break down the things I want to count into smaller things of 
the same type? 


Let’s look at the binomial coefficients again. What happens to the k-subsets of the set 
S = {x1,...,2%n} if we throw away x,? We then have subsets of S\ {xn} = {21,...,2n—1}- 
The k-subsets of S that did not contain x, are still k-subsets, but those that contained 
Xn have become (k — 1)-subsets. We get all k-subsets and all (k — 1)-subsets of S \ {x,} 
exactly once when we do this. Thus C(n,k) = C(n—1,k) +C(n—1,k—1) by the Rule of 
Sum. 0 


Example 24 (Set partitions) A partition of a set B is a collection of nonempty 
subsets of B such that each element of B appears in exactly one subset. Each subset is 
called a block of the partition. The 15 partitions of {1,2,3,4} by number of blocks are 
1 block: {{1,2,3,4}} 
2 blocks: {{1,2,3},{4}}  {{1,2,4},{3}} {{1,2}, {3,45} — {{1,3, 4}, {2}} 
{{1,3},{2,4}}  {{1,4},{2,3}}  {{1}, {2,3,4}} 
3 blocks: {{1,2}, {3}, {43} {{1, 3}, {2}, {4b} {{1, 4}, {2b {3} {1}, {2,3}, {4} 
{{1}, {2,4}, 133} {{1}, {2}, {3,43} 
4 blocks: {{1}, {2}, {3}, {4}} 


Let S(n, k) be the number of partitions of an n-set having exactly k blocks. These are 
called Stirling numbers of the second kind. Do not confuse S(n,k) with C(n,k) = (7). In 
both cases we have an n-set. For C(n,k) we want to choose a subset containing k elements 
and for S(n,k) we want to partition the set into k blocks. 


What is the value of S(n,k)? Let’s try to get a recursion. How can we build partitions 
of {1,2,...,n} with & blocks out of smaller cases? If we take partitions of {1,2,...,n—1} 
with k—1 blocks, we can simply add the block {n}. If we take partitions of {1,2,...,n—1} 
with & blocks, we can add the element n to one of the k blocks. You should convince yourself 
that all k block partitions of {1,2,...,n} arise in exactly one way when we do this. This 
gives us a recursion for S(n,k). Putting n in a block by itself contributes S(n — 1,k — 1). 
Putting n in a block with other elements contributes S(n—1,k) x k by the Rule of Product. 
By the Rule of Sum 

S(n,k) = S(n-—1,k —1) +k S(n—-1,k). 


Let’s take a tearing down view. If we remove n from the set {1,...,n} and from the 
block of the partition in which it occurs: 


e We get a partition counted by S(n—1,k—1) if n was in a block by itself because that 
block disappears. 


e We get a partition counted by S(n — 1,k) if n was in a block with other things. In 
fact, we get each of these partitions k times since n could have been in any of the k 
blocks. 


This gives us our recursion S(n,k) = S(n—1,k —1)+ kS(n—1,k) again. 


To illustrate, let’s look at what happens when we remove 4 from our earlier list of 
3-block partitions: 


3 blocks: {{1,2}, {3}, {43} {11,3}, (2b (4b C14 12h. 183) tb 12,3, 145 
{{1} {2,45 {33} (Cb, {2h 18, 4b 
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The partitions with singleton blocks {4} removed give us the partitions 


CL 2b{3F LS 2bF tt (2, FF. 


Thus the partitions counted by S(3,2) each occur once. The partitions in which 4 is not 
in a singleton block, with 4 removed, give us the partitions 


CU 2b 3rF TU 2B 8b HU (25, 135. 


Thus the partitions counted by $(3,3) (there’s only one) each occur 3 times. Hence 
S(4, 3) = S(3, 2) + 3S(3, 3). 


Below is the tabular form for S(n,k) analogous to the tabular form for C(n, k). 


Notice that the starting conditions for this table are that S(n,1) = 1 for all n > 1 and 
S(n,n) = 1 for all n > 1. The values for n = 7 are omitted from the table. You should 
fill them in to test your understanding of this computational process. For each n, the total 
number of partitions of a set of size n is equal to the sum S(n,1) + S(n,2) +...S(n,n). 
These numbers, gotten by summing the entries in the rows of the above table, are called 
the Bell numbers, B,. For example, By =1+7+6+1=15. 0 


Exercises for Section 3 


3.1. How many 6 card hands contain 3 pairs? 


3.2. How many 5 card hands contain a straight? A straight is 5 consecutive cards from 
the sequence A,2,3,4,5,6,7,8,9,10,J,Q,K,A without regard to suit. 


3.3. How many compositions of n (sequences of positive integers called “parts” that 
add to n) are there that have exactly k parts? A composition of 5, for example, 
corresponds to a placement of either a “+” or a “,” in the four spaces between a 
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sequence of five ones: 11111. Thus, the placement 1,1+1,1-+ 1 corresponds to 
the composition (1, 2,2) of 5 which has 3 parts. 


3.4. How many rearrangements of the letters in EXERCISES are there? 


3.5. In some card games only the values of the cards matter and their suits are irrelevant. 
Thus there are effectively only 13 distinct cards among 52 total. How many different 
ways can a deck of 52 cards be arranged in this case? The answer is a multinomial 
coefficient. 


3.6. In a distant land, their names are spelled using the letters A, I, L, S, and T. 
Each name consists of seven letters. Each name begins and ends with a conso- 
nant, contains no adjacent vowels and never contains three adjacent consonants. If 
two consonants are adjacent, they cannot be the same. An example of a name is 
LASLALS, but LASLASS and LASLAAS are not names. 


(a) List the first 4 names in dictionary order. 
(b) List the last 4 names in dictionary order. 


(c) How many names are possible? 


ar. Prow (")=(",) and (R) + (t) e--+ () <2" 


3.8. For n > 0, prove the following formulas for S(n, k): 


n 


S(n;) = 1, S(n,n—-1) = € 


), SG 1)=1 S(n, 2) = (2"—2)/2 = 2"- "1. 


3.9. Let B, be the total number of partitions of an n element set. Thus 
B, = S(n,1) + S(n,2) +---+ S(n,n) for n> 0. 


These numbers are called the Bell numbers. 


(a) Prove that 


n 


Bn 1S (") Bes 

+ > : 

where Bo is defined to be 1. 

Hint: To construct a partition, first construct the block containing n + 1 and 
then construct the rest of the partition. If you prefer tearing down instead of 
building up, remove the block containing n+ 1. 


(b) Calculate B,, for n < 5 by using the formula in (a). 


3.10. We consider permutations a,,...,a9 of 1,2,3,4,5,6,7,8,9. 
(a) How many have the property that a; < a;4, for all i < 8? 


(b) How many have the property that a; < a;+1 for all i < 8 except i = 5? 
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Section 4: Probability and Basic Counting 


Techniques of counting are very important in probability theory. In this section, we take a 
look at some of the basic ideas in probability theory and relate these ideas to our counting 
techniques. This requires, for the most part, a minor change of viewpoint and of terminol- 
ogy. 


Let U be a set and suppose for now that U is finite. We think of U as a “universal 
set” in the sense that we are going to be concerned with various subsets of U and their 
relationship with each other. In probability theory, the term “universal set” is replaced 
by sample space. Thus, let U be a sample space. We say that we “choose an element of 
U uniformly at random” if we have a method of selecting an element of U such that all 
elements of U have the same chance of being selected. This definition is, of course, self 
referential and pretty sloppy, but it has intuitive appeal to anyone who has selected people 
for a sports team, or for a favored task at camp, and attempted to be fair about it. We 
leave it at this intuitive level. 


The quantitative way that we say that we are selecting uniformly at random from a 
sample space U is to say that each element of U has probability 1/|U| of being selected. 


A subset E C U is called an event in probability theory. If we are selecting uniformly 
at random from U, the probability that our selection belongs to the set F is |E|/|U|. At 
this point, basic probability theory involves nothing more than counting (i.e., we need to 
count to get |£| and |U)). 


A more general situation arises when the method of choosing is not “fair” or “uniform.” 
Suppose U = {H,T} is a set of two letters, H and T. We select either H or T by taking a 
coin and flipping it. If “heads” comes up, we choose H, otherwise we choose JT’. The coin, 
typically, will be dirty, have scratches in it, etc., so the “chance” of H being chosen might 
be different from the chance of T being chosen. If we wanted to do a bit of work, we could 
flip the coin 1000 times and keep some records. Interpreting these records might be a bit 
tricky in general, but if we came out with 400 heads and 600 tails, we might suspect that 
tails was more likely. It is possible to be very precise about these sort of experiments (the 
subject of statistics is all about this sort of thing). But for now, let’s just suppose that the 
“probability” of choosing H is 0.4 and the probability of choosing T is 0.6. Intuitively, we 
mean by this that if you toss the coin a large number N of times, about 0.4N will be heads 
and 0.6N will be tails. The function P with domain U = {H,T} and values P(H) = 0.4 
and P(T) = 0.6 is an example of a “probability function” on a sample space U. 


The more general definition is as follows: 


Definition 4 (Probability function and probability space) Let U be a finite 
sample space and let P be a function from U to R (the real numbers) such that P(t) > 0 
for allt and icy P(t) =1. 


e P is called a probability function on U. 
e The pair (U, P) is called a probability space. 
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e We extend P to events E C U by defining P(E) = )0,-" P(t). P(E) is called the 
probability of the event E. (Ift € U, we write P(t) and P({t}) interchangeably.) 


An element t € U is called an elementary event or a simple event. 


Note that since P(t) > 0 for all t, it follows from 5*> P(t) = 1 that P(t) < 1. 


Think of U asa set of elementary events that can occur. Each time we do an experiment 
or observe something, exactly one of the elementary events in U occurs. Imagine repeating 
this many times. Think of P(t) as the fraction of the cases where the elementary event 
t occurs. The equation >°,<y P(t) = 1 follows from the fact that exactly one elementary 
event occurs each time we do our experiment. Think of P(£) as the fraction of time an 
elementary event in & occurs. 


Theorem 9 (Disjoint events) Suppose that (U,P) is a probability space and that X 
and Y are disjoint subsets of U; that is, X NY =@. Then P(X UY) = P(X) + P(Y). 


Proof: By definition, P(X UY) is the sum of P(t) over allt ¢ XUY. Ift e XUY, then 
either t € X or t € Y, but not both because X 1 Y = J. Thus we can split the sum into 
two sums, one over t € X and the other over t € Y. These two sums are P(X) and P(Y), 
respectively. Thus P(X UY) = P(X)+ P(Y). 


We could rephrase this using summation notation: 


P(XUY)= SO P(t)= 50 P(t) +0 PH = P(X) + PLY), 


teXUY tEx teY 


where we could split the sum into two sums because t € X UY means that either t € X or 
t€ Y, but not both because XNY =9. OF 


Example 25 (Dealing a full house) What is the probability of being dealt a full 
house? There are (e) distinct hands of cards so we could simply divide the answer 3,744 
from Example 15 by this number. That gives the correct answer, but there is another way 
to think about the problem. 


When a hand of cards is dealt, the order in which you receive the cards matters: Thus 
receiving 3@ 6 20 in that order is a different dealing of the cards than receiving 20 3@ 60 
in that order. Thus, we regard each of the 52 x 51 x 50 x 49 x 48 ways of dealing five cards 
from 52 as equally likely. Thus each hand has probability 1/52 x 51 x 50 x 49 x 48. Since 
all the cards in a hand of five cards are different, they can be ordered in 5! ways. Hence 
the probability of being dealt a full house is =—2“4** ___, which does indeed equal 3,744 


: 52x51x50x 49x 48 ” 
divided by ive ie 


If cards are not all distinct and if we are not careful, the two approaches give different 
answers. The first approach gives the wrong answer. We now explain why. Be prepared to 
think carefully, because this is a difficult concept for beginning students. 


To illustrate consider a deck of 4 cards that contains two aces of spades and two jacks 
of diamonds. There are 3 possible two card hands: 2 aces, 1 ace and 1 jack, or 2 jacks, but 
the probability of getting two aces is only 1/6. Can you see how to calculate that correctly? 
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We can look at this in at least two ways. Suppose we are being dealt the top two cards. 
The probability of getting two aces equals the fraction of ways to assign positions to cards 
so that the top two are given to aces. There are ie ways to assign positions to aces and 
only one of those results in the aces being in the top two positions. 


Here’s the other way to look at it: Mark the cards so that the aces can be told apart, 
and the jacks can be told apart, say A,, Ag, J1, and Jo. Since the cards are distinct each 
hand can be ordered in the same number of ways, namely 2!, and so we can count ordered 
or unordered hands. There are now (5) unordered hands (or 4 x 3 ordered ones) and only 
one of these (or 2 x 1 ordered ones) contain A; and Ag. O 


Example 26 (Venn diagrams and probability) A “Venn diagram” shows the rela- 
tionship between elements of sets. The interior of the rectangle in the following figure 
represents the sample space U. The interior of each of the circular regions represents the 
events A and B. 


Let’s list what each of the regions in the figure are: 


1 is (AU B)* 2is A-—B 3is ANB 4is B—A. 


We can compute either set cardinalities or probabilities. For example, U \ A is all of 
U except what is in the region labeled A. Thus |U \ A] = |U|—|A|. On the other hand, A 
and A° partition the sample space and so P(A) + P(A‘) = 1. Rewriting this as 


P(A’) =1— P(A) 


puts it in the same form as |U \ A| = |U| — |A| since U \ A = A®. Notice that the only 
difference between the set and probability equations is the presence of the function P and 
the fact that P(U) = 1. Also notice that the probability form did not assume that the 
probability was uniformly at random. 


What about AU B? It corresponds to the union of the disjoint regions labeled 2, 3 
and 4 in the Venn diagram. Thus 


P(AUB) = P(A- B)+ P(AN B)+ P(B- A) 
by Theorem 9. We can express P(A — B) in terms of P(A) and P(ANM B) because A is the 
disjoint union of A— B and AN B: P(A) = P(A- B)+ P(ANB). Solving for P(A — B) 
and writing a similar expression for P(B— A): 


P(A—B) = P(A) — P(ANB) P(B— A) = P(B) — P(ANB). 


30 


Section 4: Probability and Basic Counting 


Combining our previous results 
P(AUB) = P(A—- B)+ P(AN B)+ P(B-— A) = P(A) + P(B) — P(ANB). 
There is a less formal way of saying this. If we take A and B we get the region labeled 
3 twice — once in A and once in B. The region labeled 3 corresponds to AM B since it is 


the region that belongs to both A and B. Thus |A|+ |B] gives us regions 2, 3 and 4 (which 
is |AU B|) and a second copy of 3, (which is |AM B|). We have shown that 


|A| + |B] =|AUB|+|AN BI. 
The probability form is P(A) + P(B) = P(AU B) + P(AN B). We can rewrite this as 
P(AUB) = P(A) + P(B) — P(AN B). 
(This is the two set case of the Principle of Inclusion and Exclusion.) 


One more example: Using DeMorgan’s Rule from Theorem 6, (AU B)*® = ASN BS. 
(Check this out in the Venn diagram.) Combining the results of the two previous para- 
graphs, 

P(A°N B°) =1— P(AUB) =1- (P(A) + P(B)— P(AN B)) 


= 1- P(A) -— P(B)+ P(ANB). 


This is another version of the Principle of Inclusion and Exclusion. J 


Example 27 (Combining events) Let U be a sample space with probability function 
P. Let A and B be events. Suppose we know that 


e A occurs with probability 7/15, 

e B occurs with probability 6/15, and 

e the probability that neither of the events occurs is 3/15. 
What is the probability that both of the events occur? 


Let’s translate the given information into mathematical notation. The first two data 
are easy: P(A) = 7/15 and P(B) = 6/15. What about the last? What is the event 
corresponding to neither of A and B occurring? One person might say A°M B°; another 
might say (AU B)*°. Both are correct by DeMorgan’s Rule. Thus the third datum can be 
written P((AU B)°) = P(A° NM B°) = 3/15. We are asked to find P(AN B). 


What do we do now? A Venn diagram can help. The situation is shown in the following 
Venn diagram for A and B. The rectangle stands for U, the whole sample space. (We’ve 
put in some numbers that we haven’t computed yet, so you should ignore them.) 


3/i5 A B 


3l 


Basic Counting and Listing 


We have been given just partial information, namely P(A) = 7/15, P(B) = 6/15, and 
P((AU B)°) = 3/15. The best way to work such problems is to use the information given 
to, if possible, find the probabilities of the four fundamental regions associated with A and 
B, namely the regions 


(AU B)* A-B B-A ANB. 


(You should identify and label the regions in the figure.) Recall that P(E°) = 1 — P(E) 
for any event FE. Thus 


P(AU B) =1- P((AUB)*) =1—3/15 = 12/15. 
From this we get (check the Venn diagram) 
P(A—B) = P(AUB) — P(B) = 12/15 — 6/15 = 6/15. 
Similarly, P(B — A) = 12/15 —7/15 = 5/15. Finally, 
P(AN B) = P(A) — P(A— B) = 7/15 — 6/15 = 1/15. 


The answer to the question we were asked at the beginning is that P(AN B) =1/15. O 


Example 28 (Odds and combining events) Let U be a sample space and let A and 
B be events where the odds of A occurring are 1:2, the odds of B occurring are 5:4 and 
the odds of both A and B occurring are 1:8. Find the odds of neither A nor B occurring. 
In popular culture, probabilities are often expressed as odds. If an event E occurs with 
odds a: b, then it occurs with P(E) = a/(a+ 6). Thus, P(A) = 1/3, P(B) = 5/9, and 
P(AN B) =1/9. From the equation P(AU B) = P(A)+ P(B) — P(AN B) in Example 26, 
P(AU B) = 7/9. From the equation P(A‘) = 1 — P(A) in that example, with AUB 
replacing A, we have P((AU B)°) = 2/9. The odds of neither A nor B occurring are 2:7. 


Caution: It is not always clear what odds mean. If someone says that the odds on 
Beatlebomb in a horse race are 100:1, this means that the probability is 100/(100 + 1) that 
Beatlebomb will lose. The probability that he will win is 1/(100+ 1). O 


Example 29 (Hypergeometric probabilities) Six light bulbs are chosen at random 
from 18 light bulbs of which 8 are defective. What is the probability that exactly two of 
the chosen bulbs are defective? We’ll do the general situation. Let 


e B denote the total number of bulbs, 
e D the total number of defective bulbs, and 
e 6 the number of bulbs chosen. 


Let the probability space be the set of all ie ) ways to choose b bulbs from B and let the 
probability be uniform. Let E(B, D,b,d) be the event consisting of all selections of b from 
B when a total of D bulbs are defective and d of the selected bulbs are defective. We want 
P(E(B, D,b,d)). The total number of ways to choose b, of which exactly d are defective, 
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is c ) (cme ). To see this, first choose d bulbs from the D defective bulbs and then choose 


b —d bulbs from the B — D good bulbs. Thus, 


P(E(B, D,b,d)) = 


Substituting B = 18, b= 6, D = 8, and d= 2 gives P(E(18,8,6,2)) = 0.32, the answer to 
our original question. 


The function P(E(B, D,b,d)) occurs frequently. It is called the hypergeometric prob- 
ability distribution. OJ 


Example 30 (Sampling with replacement from six cards) First one card and then 
a second card are selected at random, with replacement, from 6 cards numbered 1 to 6. 
What is the probability that the sum of the values on the cards equals 7? That the sum of 
the values of the cards is divisible by 5? Since both cards are selected from the same set of 
cards numbered one to six, this process is called “sampling with replacement.” The idea is 
that one can choose the first card, write down its number, replace it and repeat the process 
a second (or more) times. The basic sample space is Sx S = {(7,7) | 1<i<6, 1< 7 <6}. 
Every point in this sample space is viewed as equally likely. Call the two events of interest 
E7 (sum equals 7) and Ds (sum divisible by 5). It is helpful to have a way of visualizing 
S x S. This can be done as follows: 


The 6 x 6 rectangular array has 36 squares. The square with row label 7 and column label 
j corresponds to (7,7) € S x S. The rectangular array on the right has the sum 7+ 7 in 
square (7,7). Thus 


E, ={(i,j):1<i<6,1<j<6,i+j=7} 


corresponds to six points in the sample space and so P(E7) = |F7|/36 = 6/36. 


A number k is divisible by 5 if k = 57 for some integer j. In that case, we write 5|k. 
Thus 


Ds = {(i,9):1<i1<6,1<j <6, 5\(@+ 7)} 


and so Ds = Es U Ejo. Finally, |Ds| =4+3=7 and P(Ds) = 7/36. O 
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Example 31 (Girls and boys sit in a row) Four girls and two boys sit in a row. 
Find the probability that each boy has a girl to his left and to his right. Suppose that 
the girls are 91, 92, 93,ga and the boys are b,, bg. There are 6! = 720 ways of putting these 
six people in a row. This set of 720 such permutations is the sample space S, and we 
assume each permutation is equally likely. Let S, denote the set of such permutations 
where each boy has a girl on his left and one on his right. There are three patterns where 
each boy has a girl on both his left and his right: gbgbgg, gbggbg, and ggbgbg. For each 
pattern, there are 2!4! = 48 ways of placing the girls and boys into that pattern. Thus, 
(3 x 48)/6! = 144/720 = 1/5 is the required probability. Note that we could have also 
taken the sample space to be the set of (5) patterns. Each pattern is equally likely since 
each arises from the same number of arrangements of the 6 children. The probability would 
then be computed as 3/($) = 3/15=1/5. O 


Example 32 (Dealing cards from a standard deck of 52 cards) A man is dealt 4 
spade cards from an ordinary deck of 52 cards. If he is given five more cards, what is the 
probability that three of them are spades? This is another example of the hypergeometric 
probability distribution. There are B = 48 cards remaining, D = 9 of them spades. We 
ask for the probability that, from b = 5 cards selected, d = 3 are spades. 


(Gna) _ GG) 


P(E(B,D,b,d)) = aC ae = 0.036. Oo 


Example 33 (Selecting points at random from a square) Suppose we have a square 
with side s and inside it is a circle of diameter d < 1. A point is selected uniformly at 
random from the square. What is the probability that the point selected lies inside the 
circle? 


We haven’t defined probability for infinite sample spaces. The intuition is that prob- 
ability is proportional to area — a “geometric probability” problem. Thus we have 


ea( FE 

P(E) = et) 
area(U) 

where U is the sample space, which is the set of points in the square. Computing areas, we 

obtain P = md?/(4s”). This is the correct answer. Clearly, this answer doesn’t depend on 

the figure being a circle. It could be any figure of area rd?/4 that fits inside the square. 0 


The next example deals with the following question: If k items are randomly put one 
at a time into n boxes what are the chances that no box contains more than one item? A 
related problem that can be dealt with in a similar manner is the following: If I choose 
ky items at random from n items and you choose kz items from the same n, what are the 
chances that our choices contain no items in common? These problems arise in the analysis 
of some algorithms. 


34 


Section 4: Probability and Basic Counting 


Example 34 (The birthday problem) Assume that all days of the year are equally 
likely to be birthdays and ignore leap years. If k people are chosen at random, what is 
the probability that they all have different birthdays? While we’re at it, let’s replace the 
number of days in a year with n. 


Here’s one way we can think about this. Arrange the people in a line. Their birth- 
days, listed in the same order as the people, are bj, b9,...,6,. The probability space is 
nxXnX-++ Xn, where there are k copies of n. Each of the n* possible k-long lists are 
equally likely. We are interested in P(A), where A consists of those lists without repeats. 
Thus |A| = n(n — 1)---(n — (k—1)) and so 


(a) = MOS Mena Wd Tea t TT (1-4), 


nk 


While this answer is perfectly correct, it does not give us any idea how large P(A) is. 
Of course, if k is very small, P(A) will be nearly 1, and, if & is very large, P(A) will be 
nearly 0. (In fact P(A) =0 if k > n. This can be proved by using the above formula. You 
should do it.) Where does this transition from near 1 to near 0 occur and how does P(A) 
behave during the transition? Our goal is to answer this question. We will show that P(A) 


~k?/2n when k is not too large and that P(A) is close to zero otherwise. 
k?/2n ( 


is approximately e 
Here’s a graph of P(A) (“staircase” curve) and the approximation function e~ smooth 
curve) for various values of k when n = 365." As you can see, the approximation is quite 
accurate. 


20 30 40 50 
If you are mildly interested in this problem, you should at least get familiar with this 
approximation by trying various values of n and k. If you are interested in the derivation 


of this result and know a bit of calculus, read on. We assume that k < n3/° to start our 
analysis. We need the following fact which will be proved at the end of the example: 


If0<a2<1/2,thene*-* <1—-2z<e™*. 


” Since P(A) is defined only for k an integer, it should be a series of dots. To make it 
more visible, we’ve plotted it as a step function (staircase). The approximation is given by 
the function ee /2n which is a smooth curve. 

8 When people work on problems like this, they usually try k < n’ and do some calcu- 


lations to find a good choice for t. This is how the 3/5 comes about. 
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First, we get an upper bound for P(A). Using 1 — x < e~* with « =i/n, 


k-1 k-1 


P(A)= H(-2)<7. S € [Te =exn(-Si/n). 
i=1 
Using the formula? 1+ 2+---+ N= N(N+1)/2 with N = k—1, we have 
ee eae ae 
aa 7 2n  2n 


Thus P(A) < e7**/2"e8/2_ Since 0 < k/n < n3/°/n = n-2/®, which is small when n is 


k/2n k? /2n 


large, e is close to 1. Thus, we have an upper bound for P(A) that is close to e~ 


Next, we get our lower bound for P(A). From the other inequality in our fact, namely 
1l-a> ent e we have 


P(A) = IT (2-2) > Tew" = (Te) (Te), 


i=1 i=l i=1 


Let’s look at the last product. It is less than 1. Sincez < k, all of the factors in the product 
are greater than e~ (*/ n)”. Since there are less than k factors, the product is greater than 
ek/n” | Since k < n3/5, k3 /n? < n9/> /n? = n-1/5, which is small when n is large. Thus 
the last product is close to 1 when n is large. This shows that P(A) has a lower bound 
which is close to Les e—'/" which is the upper bound estimate we got in the previous 
paragraph. Since our upper and lower bounds are close together, they are both close to 


P(A). In the previous paragraph, we showed that the upper bound is close to en k?/2n 


To summarize, we have shown that 
If n is large and k < n3/5, then P(A) is close to e~**/2”, 


What happens when k > n3/5? 


e First note that P(A) decreases as k increases. You can see this by thinking about 
the original problem. You can also see it by looking at the product we obtained for 
P(A), noting that each factor is less than 1 and noting that we get more factors as k 
increases. 


e Second note that, when k is near n°/> but smaller than n3/°, then k?/2n is large and 
so P(A) is near 0 since e to a large negative power is near 0. 


Putting these together, we see that P(A) must be near 0 when k > n3/°, 


How does e~*’/2” behave? When k is much smaller than /n, k?/2n is close to 0 and 
so e~**/2” is close to 1. When k is much larger than Vn, k?/2n is large and so e- hen. 
close to 0. Put in terms of birthdays, for which n = 365 and 365 = 19: 


is 


e When k is much smaller than 19, the probability of distinct birthdays is nearly 1. 


° This is a formula you should have learned in a previous class. 
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e When & is much larger than 19, the probability of distinct birthdays is nearly 0. 
e In between, the probability of distinct birthdays is close to ek /(2x365) 


We now prove our fact. It requires calculus. By Taylor’s theorem, the series for the 


natural logarithm is 
© hk 
In(l—2)=-)0 >. 
i=l 


Since x > 0 all the terms in the sum are negative. Throwing away all but the first term, 
In(1 — x) < —ax. Exponentiating, we have 1 — x < e~*, which is half of our fact. 


Note that - = 
Soak =a2+2° See 
k=1 k=2 


Since 0 < x < 1/2 and k > 2 in the second sum, we have x*~?/k < (1/2)*-?/2. By the 


formula for the sum of a geometric series,!° 
= k—2 2 3 1/2 
S(1/2)*-?/2 = 1/2 + (1/2)? + (1/2)? +--- = T-1p~ 
k=2 


Thus as ae 
So a*/k =2+2°S ak? /k <2+2", 
k=1 k=2 


and so In(1 — x) > —x — x?, which gives us the other half of our fact. 0 


Exercises for Section 4 


4.1. Six horses are in a race. You pick two of them at random and bet on them both. 
Find the probability that you picked the winner. State clearly what your probability 
space is. 


4,2. A roulette wheel consists of 38 containers numbered 0 to 36 and 00. In a fair wheel 
the ball is equally likely to fall into each container. A special wheel is designed in 
which all containers are the same size except that 00 is 5% larger than any of the 
others so that 00 has a 5% greater chance of occurring than any of the other values. 
What is the probability that 00 will occur on a spin of the wheel? 


4.3. Alice and Bob have lost a key at the beach. They each get out their metal detectors 
and hunt until the key is found. If Alice can search 20% faster than Bob, what are 
the odds that she finds the key? What is the probability that Alice finds the key? 


10 Recall that the sum of the geometric series a+ ar + ar? +--+ is a/(1—r). You should 
be able to see that here a = 1/2 and r = 1/2. 
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4.4. 


4.5. 


4.6. 


4.7. 


4.8. 


4.9. 


4.10. 


Six horses are in a race. You pick two of them at random and bet on them both. 
Find the probability that you picked a horse that won or placed (came in second). 
This should include the possibility that one of your picks won and the other placed. 


Suppose 4 different balls are placed into 4 labeled boxes at random. (This can be 
done in 44 ways.) 


(a) What is the probability that no box is empty? 
(b) What is the probability that exactly one box is empty? 
(c) What is the probability that at least one box is empty? 
(d) Repeat (a)—(c) if there are 5 balls and 4 boxes. 


For each event E determine P(E). 


(a) Suppose a fair die is thrown & times and the values shown are recorded. What 
is the sample space? What is the probability of the event E that the sum of 
the values is even? 


(b) A card is drawn uniformly at random from a regular deck of cards. This process 
is repeated n times, with replacement. What is the sample space? What is the 
probability that a king, K, doesn’t appear on any of the draws? What is the 
probability that at least one K appears in n draws? 


(c) An urn contains 3 white, 4 red, and 5 blue marbles. Two marbles are drawn 
without replacement. What is the sample space? What is the probability that 
both marbles are red? 


Six light bulbs are chosen at random from 15 bulbs of which 5 are defective. What 
is the probability that exactly 3 are defective? 


An urn contains ten labeled balls, labels 1,2,..., 10. 


(a) Two balls are drawn together. What is the sample space? What is the proba- 
bility that the sum of the labels on the balls is odd? 


(b) Two balls are drawn one after the other without replacement. What is the 
sample space? What is the probability that the sum is odd? 


(c) Two balls are drawn one after the other with replacement. What is the sample 
space? What is the probability that the sum is odd? 


Let A and B be events with P(A) = 3/8, P(B) = 1/2, and P((AU B)°) = 3/8. 
What is P(AN B)? 


Of the students at a college, 20% are computer science majors and 58% of the 
entire student body are women. 430 of the 5,000 students at the college are women 
majoring in computer science. 


(a) How many women are not computer science majors? 
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(b) How many men are not computer science majors? 


(c) What is the probability that a student selected at random is a woman computer 
science major? 


(d) What is the probability that a female student selected at random is a computer 
science major? 


The odds on the horse Beatlebomb in the Kentucky Derby are 100 to 1. A man 
at the races tells his wife that he is going to flip a coin. If it comes up heads he 
will bet on Beatlebomb, otherwise he will skip this race and not bet. What is the 
probability that he bets on Beatlebomb and wins? 


Four persons, called North, South, East, and West, are each dealt 13 cards from an 
ordinary deck of 52 cards. If South has exactly two aces, what is the probability 
that North has the other two aces? 


You have been dealt 4 cards and discover that you have 3 of a kind; that is, 3 cards 
have the same face value and the fourth is different. For example, you may have 
been dealt 4@ 49 10@ 4&. The other three players each receive four cards, but 
you do not know what they have been dealt. What is the probability that the fifth 
card will improve your hand by making it 4 of a kind or a full house (3 of a kind 
and a pair)? 


. Three boys and three girls are lined up in a row. 


(a) What is the probability of all three girls being together? 


(b) Suppose they are then seated around a circular table with six seats in the 
same order they were lined up. What is the probability that all three girls sit 
together? 


Prove the principle of inclusion exclusion, for three sets namely that 


P(A BNC’) = 1—P(A)—P(B)—P(C)+P(ANB)+P(ANC)+P(BNC)—P(ANBNC). 


4.16. 


4.17. 


*4.18. 


(The formula extends in a fairly obvious way to any number of sets.) 
Hint: Recall that, that for two sets, P(A°N B®) =1— P(A) — P(B)+ P(ANB). 


A point is selected uniformly at random on a stick. This stick is broken at this 
point. What is the probability that the longer piece is at least twice the length of 
the shorter piece? 


Two points are selected uniformly at random on a stick of unit length. The stick 
is broken at these two points. What is the probability that the three pieces form a 
triangle? 


What is the probability that a coin of diameter d < 1 when tossed onto the Eu- 
clidean plane (i.e., R x R, R the real numbers) covers a lattice point of the plane 
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(i.e., a point (p,q), where p and q are integers)? 
Hint: Compare this problem with Example 33. 


*4,19. Three points are selected at random on a circle C. What is the probability that all 
three points lie on a common semicircle of C’? What if 3 is replaced by k? 


AO 


Review Questions 


Multiple Choice Questions for Review 


1. Suppose there are 12 students, among whom are three students, M, B, C (a Math 
Major, a Biology Major, a Computer Science Major). We want to send a delegation 
of four students (chosen from the 12 students) to a convention. How many ways can 
this be done so that the delegation includes exactly two (not more, not less) students 
from {M, B,C}? 


(a) 32 (b)64.—s (c) 88. ~— (d) 108 _~—(e) 144 


2. The permutations of {a,b,c,d,e, f,g} are listed in lex order. What permutations are 
just before and just after bacde fg? 


(a) Before: agfedbc, After: bacdf ge 
(b) Before: agfedcb, After: badce fg 
(c) Before: agfebcd, After: bacedgf 
(d) Before: agfedcb, After: bacdf ge 
(e) Before: agfedcb, After: bacdegf 


3. Teams A and B play in a basketball tournament. The first team to win two games in 
a row or a total of three games wins the tournament. What is the number of ways the 
tournament can occur? 


(a) 8 (b) 9 (c) 10 (d) 11 (e) 12 


4. The number of four letter words that can be formed from the letters in BUBBLE (each 
letter occurring at most as many times as it occurs in BUBBLE) is 


(a) 72 (b) 74 (c) 76 (d) 78 (e) 80 

5. The number of ways to seat 3 boys and 2 girls in a row if each boy must sit next to at 
least one girl is 
(a) 36 (b) 48 (c) 148 (d) 184 (e) 248 

6. Suppose there are ten balls in an urn, four blue, four red, and two green. The balls 
are also numbered 1 to 10. How many ways are there to select an ordered sample of 


four balls without replacement such that there are two blue balls and two red balls in 
the sample? 


(a) 144 (b) 256 (c) 446 (d) 664 (e) 864 
7. How many different rearrangements are there of the letters in the word BUBBLE? 
(a) 40 (b) 50 (c) 70 (d) 80 (e) 120 


8. The English alphabet has 26 letters of which 5 are vowels (A,E,I,O,U). How many 
seven letter words, with all letters distinct, can be formed that start with B, end with 
the letters ES, and have exactly three vowels? The “words” for this problem are just 
strings of letters and need not have linguistic meaning. 


(ay 2? ea" x17 
(b) 23 « 3? x 19 
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(c) 24% 37 x 19 
(dj: 2° 837 19 
(6). 2? 3? 17 


9. The permutations on {a, b,c, d,e, f,g} are listed in lex order. All permutations 71 2%273%4%5X627 
with x4 = a or x4 = c are kept. All others are discarded. In this reduced list what 
permutation is just after dagcfeb? 


(a) dbacef g 
(b) dbcaefg 
(c) dbacgfe 
(d) dagcfbe 
(e) dcbaefg 
10. The number of four letter words that can be formed from the letters in SASSABY 


(each letter occurring at most as many times as it occurs in SASSABY) is 
(a) 78 (b) 90 (c) 108 (d) 114 (e) 120 
11. How many different rearrangements are there of the letters in the word TATARS if the 
two A’s are never adjacent? 
(a) 24 (b) 120 (c) 144 (d) 180 (e) 220 


12. Suppose there are ten balls in an urn, four blue, four red, and two green. The balls 
are also numbered 1 to 10. How many ways are there to select an ordered sample of 
four balls without replacement such that the number B > 0 of blue balls, the number 
R> 0 of red balls, and the number G > 0 of green balls are all different? 


(a) 256 (b) 864 (c) 1152. =(d) 1446 ~—(e) 2144 


13. Suppose there are ten balls in an urn, four blue, four red, and two green. The balls 
are also numbered 1 to 10. You are asked to select an ordered sample of four balls 
without replacement. Let B > 0 be the number of blue balls, R > 0 be the number 
of red balls, and G > 0 be the number of green balls in your sample. How many ways 
are there to select such a sample if exactly one of B, R, or G must be zero? 


(a) 256 (b) 1152 = (c) 1446 = (d) 2144_—(e) 2304 
14. The number of partitions of X = {a,b,c,d} with a and b in the same block is 
(a) 4 (b)5 (c)6 (d)7 (e)8 


15. Let Wa.» and W,. denote the set of partitions of X = {a,b,c,d,e} with a and b belonging 
to the same block and with a and c belonging to the same block, respectively. Similarly, 
let Wabc denote the set of partitions of X = {a,b,c,d,e} with a, b, and c belonging to 
the same block. What is |W,, U W,-|? (Note: B(3) = 5, B(4) = 15, B(5) = 52, where 
B(n) is the number of partitions of an n-element set). 


(a) 25 (b)30 (c)35 (ad) 40 (e) 45 


16. The number of partitions of X = {a,b,c,d,e, f,g} with a, b, and c in the same block 
and c, d, and e in the same block is 
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17. 


18. 


19. 


20. 


21. 


22. 


23. 


24. 


25. 


Review Questions 


(a)2  (b)5 (c)10 (d)15 (e)52 


Three boys and four girls sit in a row with all arrangements equally likely. Let x be 
the probability that no two boys sit next to each other. What is x? 


(a) 1/7 (b) 2/7 (ce) 8/7_— (d) 4/7 (e) 5/7 


A man is dealt 4 spade cards from an ordinary deck of 52 cards. He is given 2 more 
cards. Let x be the probability that they both are the same suit. Which is true? 


(3) ee Se 2, 
by 0 <a 
te) bee 
(disor =< 4 
(e) 4<a2<.5 


Six light bulbs are chosen at random from 15 bulbs of which 5 are defective. What is 
the probability that exactly one is defective? 


(a) C(5, 1)C(10, 6) /C(15, 6) 
(b) C(5,1)C(10,5)/C(15, 6) 
(c) C(5, 1)C(10, 1) /C(15, 6) 
(d) C(5,0)C(10,6)/C(15, 6) 
(e) C(5,0)C(10, 5) /C(15, 6) 


A small deck of five cards are numbered 1 to 5. First one card and then a second card 
are selected at random, with replacement. What is the probability that the sum of the 
values on the cards is a prime number? 


(a) 10/25 (b) 11/25. (c) 12/25. — (d) 13/25. ~—(e) 14/25 


Let A and B be events with P(A) = 6/15, P(B) = 8/15, and P((AU B)°) = 3/15. 
What is P(AN B)? 


(a) 1/15 (b) 2/15 (c) 3/15. (a) 4/15 (e) 5/15 


Suppose the odds of A occurring are 1:2, the odds of B occurring are 5:4, and the odds 
of both A and B occurring are 1:8. The odds of (AN B®) U (BN A’°) occurring are 


(a) 2:3 (b) 4:3 (c) 5:3 (d) 6:3 (e) 7:3 

A pair of fair dice is tossed. Find the probability that the greatest common divisor of 
the two numbers is one. 

(a) 12/36 (b) 15/36 (c) 17/36 (d) 19/36 (e) 23/36 

Three boys and three girls sit in a row. Find the probability that exactly two of the 


girls are sitting next to each other (the remaining girl separated from them by at least 
one boy). 


(a) 4/20 (b) 6/20 (c) 10/20 += (d) 12/20 ~—(e) 13/20 


A man is dealt 4 spade cards from an ordinary deck of 52 cards. If he is given five 
more, what is the probability that none of them are spades? 
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Basic Counting and Listing 


(a) Coy) (b) CAG) (c) Cares) (d) C/G) (e) Carles) 


Answers: 1 (d), 2 
12 (c), 13 (e), 14 (b), 
23 (ce), 24 (d), 25 (d). 
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Functions 


Section 1: Some Basic Terminology 


Functions play a fundamental role in nearly all of mathematics. Combinatorics is no ex- 
ception. In this section we review the basic terminology and notation for functions. Per- 
mutations are special functions that arise in a variety of ways in combinatorics. Besides 
studying them for their own interest, we’ll see them as a central tool in other topic areas. 


Except for the real numbers R, rational numbers Q and integers Z, our sets are normally 
finite. The set of the first n positive integers, {1,2,...,} will be denoted by n. 


Recall that |A| is the number of elements in the set A. When it is convenient to do 
so, we linearly order the elements of a set A. In that case we denote the ordering by 
@1,42,...,@)4) or by (a1,@2,...,a)4)). Unless clearly stated otherwise, the ordering on a 
set of numbers is the numerical ordering. For example, the ordering on n is 1,2,3,...,n. 


A review of the terminology concerning sets will be helpful. When we speak about 
sets, we usually have a “universal set” U in mind, to which the various sets of our discourse 
belong. Let U be a set and let A and B be subsets of U. 


e The sets AN B and AU B are the intersection and union of A and B. 

e The set A\ Bor A— B is the set difference of A and B; that is, the set 
{z:rEA,x¢ B}. 

e The set U \ A or AS is the complement of A (relative to U). The complement of A is 
also written A’ and ~A. 


e The set A® B=(A\ B)U(B\ A) is symmetric difference of A and B; that is, those 
x that are in exactly one of A and B. We have AG B=(AUB)\ (ANB). 


e P(A) is the set of all subsets of A. (The notation for P(A) varies from author to 
author.) 


e P;(A) the set of all subsets of A of size (or cardinality) k. (The notation for P,(A) 
varies from author to author.) 


e The Cartesian product A x B is the set of all ordered pairs built from A and B: 
Ax B={(a,b)|aeAand be B}. 


We also call A x B the direct product of A and B. 


If A = B = R, the real numbers, then R x R, written R?, is frequently interpreted as 
coordinates of points in the plane. Two points are the same if and only if they have the 
same coordinates, which says the same thing as our definition, (a,b) = (a’,b’) ifa =a’ and 
b = 0’. Recall that the direct product can be extended to any number of sets. How can 
Rx Rx R= R?’ be interpreted? 


Definition 1 (Function) If A and B are sets, a function from A to B is a rule that tells 
us how to find a unique b € B for each a € A. We write f: A— B to indicate that f is a 
function from A to B. 
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We call the set A the domain of f and the set B the range! or, equivalently, codomain 
of f. To specify a function completely you must give its domain, range and rule. 


The set of all functions from A to B is written B“, for a reason we will soon explain. Thus 
f:A—- Band f € B“ say the same thing. 


In calculus you dealt with functions whose ranges were R and whose domains were 
contained in R; for example, f(z) = 1/(x? — 1) is a function from R — {—1,1} to R. 
You also studied functions of functions! The derivative is a function whose domain is all 
differentiable functions and whose range is all functions. If we wanted to use functional 
notation we could write D(f) to indicate the function that the derivative associates with 


Fe 


Definition 2 (One-line notation) When A is ordered, a function can be written in 
one-line notation as (f(a1), f(a2),---,f(aja))). Thus we can think of a function as an 
element of B x B x... x B, where there are |A| copies of B. Instead of writing B!4'! 
to indicate the set of all functions, we write B4. Writing B!A! is incomplete because the 
domain A is not specified. Instead, only its size |A| is given. 


Example 1 (Using the notation) To get a feeling for the notation used to specify a 
function, it may be helpful to imagine that you have an envelope or box that contains a 
function. In other words, this envelope contains all the information needed to completely 
describe the function. Think about what you’re going to see when you open the envelope. 


You might see 
P={a,b,c}, g:P—74,  g(a)=3, g(b)=1 and g(c)=4. 


This tells you that name of the function is g, the domain of g is P, which is {a,b,c}, and 
the range of g is 4 = {1,2,3,4}. It also tells you the values in 4 that g assigns to each of 
the values in its domain. Someone else may have put 


g: Almbcd ordering: a, b,c, = (314). 


in the envelope instead. This describes the same function. It doesn’t give a name for the 
domain, but we don’t need a name like P for the set {a, b,c} — we only need to know what 
is in the set. On the other hand, it gives an order on the domain so that the function can 
be given in one-line form. Can you describe other possible envelopes for the same function? 


What if the envelope contained only g = (3,1,4)? You’ve been cheated! You must 
know the domain of g in order to known what g is. What if the envelope contained 


the domain of g is {a,b,c}, ordering: a,b,c, g = (38,1,4)? 


We haven’t specified the range of g, but is it necessary since we know the values of the 
function? Our definition included the requirement that the range be specified, so this is 
not a complete definition. On the other hand, in some discussions the range may not be 
important; for example, if g = (3,1,4) all that may matter is that the range is large enough 
to contain 1, 3 and 4. In such cases, we’ll be sloppy and accept this as if it were a complete 
specification. [J 


! Some people define “range” to be the values that the function actually takes on. Most 
people call that the image, a concept we will discuss a bit later. 
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Example 2 (Counting functions) By the Rule of Product, |B4| = |BllAl, We can 
represent a subset S of A by a unique function f: A — {0,1} where f(x) = 0ifa¢S 
and f(x) = 1 if « € S. This proves that there are 2/4! such subsets. For example, if 
A= {a,b,d}, then the number of subsets of A is 2!{%>-4} — 23 = 8. 


We can represent a multiset S formed from A by a unique function f:A ~ N = 
{0,1,2,...} where f(x) is the number of times x appears in S. If no element is allowed to 
appear more than k times, then we can restrict the codomain of f to be {0,1,...,4} and 
so there are (k+1)!4! such multisets. For example, the number of multisets of A = {a,b,d} 
where each element can appear at most 4 times is (4+ 1)I4l = 53 = 125. The particular 
multiset {a,a,a,d,d} is represented by the function f(a) = 3, f(b) =0 and f(d) = 2. 


We can represent a k-list of elements drawn from a set B, with repetition allowed, by 
a unique function f:k > B. In this representation, the list corresponds to the function 
written in one-line notation. (Recall that the ordering on k is the numerical ordering.) 
This proves that there are exactly |B|* such lists. For example, the number of 4-lists that 
can be formed from B = {a,b,d} is |B|4 = 3+ = 81. The 4-list (b,d,d,a) corresponds to 
the function f = (b,d,d,a) in 1-line notation, where the domain is 4. 0 


Definition 3 (Types of functions) Let f: A — B be a function. (Specific examples of 
these concepts are given after the definition.) 


e If for every b € B there is ana € A such that f(a) = 6, then f is called a surjection 
(or an onto function). Another way to describe a surjection is to say that it takes on 
each value in its range at least once. 


e If f(x) = f(y) implies x = y, then f is called an injection or a one-to-one function). 


Another way to describe an injection is to say that it takes on each value in its range 
at most once. The injections in SE correspond to k-lists without repetitions. 


e If f is both an injection and a surjection, it is a called a bijection. 
e The bijections of A“ are called the permutations of A. 


e If f:A > B is a bijection, we may talk about the inverse of f, written f~', which 
reverses what f does. Thus f~!:B > A and f~1(b) is that unique a € A such that 
f(a) =b. Note that f(f~+(b)) =b and f-!(f(a)) =a? 


Example 3 (Types of functions) Let A= {1,2,3} and B = {a,b} be the domain and 
range of the function f = (a,b,a). The function is a surjection because every element of 
the range is “hit” by the function. It is not an injection because a is hit twice. 


Now consider the function g with domain B and range A given by g(a) = 3 and 
g(b) = 1. It is not a surjection because it misses 2; however, it is an injection because each 
element of A is hit at most once. 


Neither f nor g is a bijection because some element of the range is either hit more than 
once or is missed. The function h with domain B and range C' = {1,3} given by h(a) = 3 
and h(b) = 1 is a bijection. At first, it may look like g and h are the same function. They 


2 Do not confuse f~! with 1/f. For example, if f:R — R is given by f(x) = 2° +1, 
then 1/f(a) = 1/(a® +1) and f(y) = (y-)”*. 
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are not because they have different ranges. You can tell if a function is an injection without 
knowing its range, but you must know its range to decide if it is a surjection. 


The inverse of the bijection h has domain C and range B it is given by h~!(1) = b and 
h*(3)=a,; 


The function f with domain and range {a,b,c,d} given in 2-line form by 
abed 
f= & c a . 
is a permutation. You can see this immediately because the domain equals the range and 
the bottom line of the 2-line form is a rearrangement of the top line. The 2-line form is 
convenient for writing the inverse—just switch the top and bottom lines. In this example, 


= becad 
ee 5) a 


Cc 


Example 4 (Functions as relations) There is another important set-theoretic way of 
defining functions. Let A and B be sets. A relation from A to B is asubset of Ax B. For 
example: 


If A=3 and B =4, then R = {(1, 4), (1, 2), (3, 3), (2,3)} is a relation from A to B. 
If the relation R satisfies the condition that, for all x € A there is a unique y € B such 


that (z,y) € R, then the relation R is called a functional relation. In the notation from 
logic, this can be written 


VaeEAFAlyeEB 2d (a,y) ER. 
This mathematical shorthand is well worth knowing: 
e “VY” means “for all”, 


e “4” means “there exists” , 


e “4!” means “there exists a unique”, and 


e “5” means “such that.” 


In algebra or calculus, when you draw a graph of a real-valued function f : R > R (such 
as f(x) = x3), you are attempting a pictorial representation of the set {(a, f(x)) : x € R}, 
which is the subset of R x R that is the “functional relation from R to R.” In general, if 
RcCA~*x Bisa functional relation, then the function f corresponding to R has domain A 
and codomain B and is given by the ordered pairs {(z, f(x)) | «€ A} = R. 


If you think of the “envelope game,” Example 1, you will realize that a functional 
relation is yet another thing you might find in the envelope that describes a function. 
When a subset is defined it is formally required in mathematics that the “universal set” 
from which it has been extracted to form a subset also be described. Thus, in the envelope, 
in addition to R, you must also find enough information to describe completely A x B. As 
you can see, a function can be described by a variety of different “data structures.” 


Given any relation R C A x B, the inverse relation R~! from B to A is defined to be 
{(y, x) : (a, y) € R}. Recall the example in the previous paragraph where A = 3, B = 4, and 
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R = {(1,4), (1, 2), (3,3), (2,3)}, The inverse relation is R~! = {(4,1), (2,1), (3, 3), (3, 2)}. 
Notice that all we’ve had to do is reverse the order of the elements in the ordered pairs 
(1,4),...,(2,3) of R to obtain the ordered pairs (4,1),...,(3,2) of Rot. 


Note that neither R nor Ro! is a functional relation in the example in the previous 
paragraph. You should make sure you understand why this statement is true (Hint: R fails 
the “J!” test and R7! fails the “V” part of the definition of a functional relation). Note 
also that if both R and R~! are functional relations then |A| = |B]. In this case, R (and 
R~') are bijections in the sense of Definition 3. 0 


Example 5 (Two-line notation) Since one-line notation is a simple, brief way to specify 
functions, we’ll use it frequently. If the domain is not a set of numbers, the notation is 
poor because we must first pause and order the domain. There are other ways to write 
functions which overcome this problem. For example, we could write f(a) = 4, f(b) = 3, 
f(c) =4 and f(d) =1. This could be shortened up somewhat to 


a>4,b>43,c>4andd—->1. 


; . b d 
By turning each of these sideways, we can shorten it even more: ¢ 3 ; a) For 


obvious reasons, this is called two-line notation. Since x always appears directly over 
f(x), there is no need to order the domain; in fact, we need not even specify the domain 
separately since it is given by the top line. If the function is a bijection, its inverse function 
is obtained by interchanging the top and bottom lines. 


The arrows we introduced in the last paragraph can be used to help visualize different 
properties of functions. Imagine that you’ve listed the elements of the domain A in one 
column and the elements of the range B in another column to the right of the domain. 
Draw an arrow from a to b if f(a) = 6. Thus the heads of arrows are labeled with elements 
of B and the tails with elements of A. Here are some arrow diagrams. 


A B A B A B 

1 a 1 a 1 a 

Dt, OX, 2 b 

a aon 3 c 
d 


In all three functions, the domain A = {1, 2,3}; however, the range B is different for each 
function. Since each diagram represents a function f, no two arrows have the same tail. If 
f is an injection, no two arrows have the same head. Thus the second and third diagrams 
are injections, but the first is not. If f is a surjection, every element of B is on the head 
of some arrow. Thus the first and third diagrams are surjections, but the second is not. 
Since the third diagram is both an injection and a surjection, it is a bijection. You should 
be able to describe the situation with the arrowheads when f is a bijection. How can you 
tell if a diagram represents a permutation? J 
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Exercises for Section 1 


1.1. 


1.2. 


1.3. 


This exercise lets you check your understanding of the definitions. In each case 
below, some information about a function is given to you. Answer the following 
questions and give reasons for your answers: 


e Have you been given enough information to specify the function? 


e Can you tell whether or not the function is an injection? a surjection? 
a bijection? 


e If possible, give the function in two-line form. 
(a) fessor, f = (3,1,2,3). 
(b) fede Kis, fH=G< 4: 
(c) fea, 253, 134, 32. 


Let A and B be finite sets and f: A > B. Prove the following claims. Some are 
practically restatements of the definitions, some require a few steps. 
a) If f is an injection, then |A| < |B]. 
) If f is a surjection, then |A| > |BI. 
(c) If f is a bijection, then |A| = |B|. 
) If |A| = |B], then f is an injection if and only if it is a surjection. 
) 


If |A| = |B, then f is a bijection if and only if it is an injection or it is a 
surjection. 


Let S be the set of students attending a large university, let J be the set of student 
ID numbers for those students, let D be the set of dates for the past 100 years 
(month/day/year), let G be the set of 16 possible grade point averages between 2.0 
and 3.5, rounded to the nearest tenth. For each of the following, decide whether or 
not it is a function. If it is, decide whether it is an injection, bijection or surjection. 
Give reasons for your answers. 


(a) The domain is S, the codomain is J and the function maps each student to his 
or her ID number. 


(b) The domain is S, the codomain is D and the function maps each student to 
his or her birthday. 


(c) The domain is D, the codomain is J and the function maps each date to the 
ID number of a student born on that date. If there is more than one such 
student, the lexicographically least ID number is chosen. 


(d) The domain is S, the codomain is G and the function maps each student to 
his or her GPA rounded to the nearest tenth. 
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(ce) The domain is G, the codomains is J and the function maps each GPA to the 
ID number of a student with that GPA. If there is more than one such student, 
the lexicographically least ID number is chosen. 


1.4. Let A = {1,2,3} and B = {a,b,d}. Consider the following subsets of sets. 


{(3,@), (2,6), (1,@)}, (3,¢ 
{(1,@), (2,6), (,d)}, — {C,@), (2,6) (3,4) 


Which of them are relations on A x B? Which of the are functional relations? 
Which of their inverses are functional relations? 


Section 2: Permutations 


Before beginning our discussion, we need the notion of composition of functions. Suppose 
that f and g are two functions such that the values f takes on are contained in the domain 
of g. We can write this as f: 4 > B and g:C — D where f(a) € C forallac A. We 
define the composition of g and f, written gf: A > D by (gf)(x) = g(f(x)) for all x € A. 
The notation go f is also used to denote composition. Suppose that f and g are given in 
two-line notation by 


po(? ars (PQRSTUYV 
“\P RT U a; ae ae ae ae ee ee 


Then gf = G . ; ) To derive (gf)(p), we noted that f(p) = P and g(P) = 1. The 


other values of gf were derived similarly. 


The set of permutations on a set A is denoted in various ways in the literature. Two 
notations are PER(A) and S(A). Suppose that f and g are permutations of a set A. Recall 
that a permutation is a bijection from a set to itself and so it makes sense to talk about f~! 
and fg. We claim that fg and f~! are also permutations of A. This is easy to see if you 
write the permutations in two-line form and note that the second line is a rearrangement 
of the first if and only if the function is a permutation. You may want to look ahead at the 
next example which illustrates these ideas. 


The permutation f given by f(a) =a for all a € A is called the identity permutation. 
Notice that fo f~' and f~!o f both equal the identity permutation. You should be able 
to show that, if f is any permutation of A and e is the identity permutation of A, then 


foe=eof=f. 


Again suppose that f is a permutation. Instead of fo f or ff we write f?. Note that 
f?(z) is not (f(x))?. (In fact, if multiplication is not defined in A, (f(x))? has no meaning.) 
We could compose three copies of f. The result is written f*. In general, we can compose 
k copies of f to obtain f*. A cautious reader may be concerned that fo (fo f) may not 
be the same as (fo f) 0 f. They are equal. In fact, f**™ = f* o f™ for all nonnegative 
integers k and m, where f° is defined by f°(x) = x for all x in the domain. This is based 
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on the “associative law” which states that fo(goh) = (fog) oh whenever the compositions 
make sense. We’ll prove these results. 


To prove that the two functions are equal, it suffices to prove that they take on the 
same values for all x in the domain. Let’s use this idea for f o(goh) and (fog)oh. We 
have 


(fo(goh))(x) = f((goh)(x)) by the definition of 0, 
= f(g(h(x))) by the definition of o. 
Similarly 
((fog)oh)(x) = (f og)(h(x)) by the definition of 0, 
= f(g(h(x))) by the definition of o. 


More generally, one can use this approach to prove by induction that f; 0 foo---o fp is well 
defined. This result then implies that f*+™ = f* o f™. Note that we have proved that the 
associative law for any three functions f, g and h for which the domain of f contains the 
values taken on by g and the domain of g contains the values taken on by h. 


Example 6 (Composing permutations) We’ll use the notation. Let f and g be the 


permutations 
pf l 234.8 ae oe ae 
ih eee ae ea Pe se Oe Be ee, ay 


We can compute fg by calculating all the values. This can be done fairly easily from the 
two-line form: For example, (fg)(1) can be found by noting that the image of 1 under g is 
2 and the image of 2 under f is 1. Thus (fg)(1) = 1. You should be able to verify that 


12 3 4 5 12 3 4 5 
Rao) f-(5 331 4)#fo 
and that 
5 fll SoS AOS gl file Be Seo OB. 
eee ek mee ar ee P= \1 93 4 5)* 
Note that it is easy to get the inverse, simply interchange the two lines. Thus 


3. foal 4 8 el a a ae ge eae ne ee a, 
f a ewe which is the same as f Aiea ig oe af 3 


since the order of the columns in two-line form does not matter. JJ 


Let f be a permutation of the set A and let n= |A|. If x € A, we can look at the 
sequence 


2, f(), FF @)) x18 PG), es 


D which is often written as 
t— f(t) > f(f(x)) 9... > fF) > --- 
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Since the range of f has n elements, this sequence will contain a repeated element in the 
first n + 1 entries. Suppose that f*(z) is the first sequence entry that is ever repeated and 
that f°*?(x) is the first time that it is repeated. Thus f*(x) = f**?(x). Apply (f7')* to 
both sides of this equality to obtain x = f?(x) and so, in fact, s = 0. It follows that the 
sequence cycles through a pattern of length p forever since 


fPTN(a) = F(FP(x)) = F(x), FPP (a) = F7(FP(a)) = F?(@), and so on. 


We call (x, f(x),..., f?~(x)) the cycle containing x and call p the length of the cycle. If 
a cycle has length p, we call it a p-cycle.? Cyclic shifts of a cycle are considered the same; 
for example, if (1,2,6,3) is the cycle containing 1 (as well as 2, 3 and 6), then (2,6,3,1), 
(6,3,1,2) and (3,1,2,6) are other ways of writing the cycle (1,2,6,3). A cycle looks like a 
function in one-line notation. How can we tell them apart? Either we will be told or it will 
be clear from the context. 


Example 7 (Using cycle notation) Consider the permutation 


Since 1 > 2 > 4 + 1, the cycle containing 1 is (1,2,4). We could equally well write it 
(2,4,1) or (4,1,2); however, (1,4,2) is a different cycle since it corresponds to 1 + 4+ 2 > 1 
The usual convention is to list the cycle starting with its smallest element. The cycles of f 
are (1,2,4), (3,8,7), (5) and (6,9). We write f in cycle form as 


f = (1,2, 4) (8,8, 7) (5) (6,9). 


It is common practice to omit the cycles of length one and write f = (1,2, 4)(3,8, 7)(6,9). 
The inverse of f is obtained by reading the cycles backwards because f~!(2) is the lefthand 
neighbor of x in a cycle. Thus 


f-' = (4,2, 1)(7,8,3)(9, 6) = (1,4, 2)(3, 7, 8)(6, 9). | 


Cycle form is useful in certain aspects of the branch of mathematics called “finite group 
theory.” Here’s an application. 


3 If (a1,a2,...,@p) is a cycle of f, then 


f (a1) = a2, f (a2) = 43, ++; F(Gp2ay = ap, f (ap) = a1. 
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Example 8 (Powers of permutations) With a permutation in cycle form, it’s very easy 
to calculate a power of the permutation. For example, suppose we want the tenth power 
of the permutation whose cycle form (including cycles of length 1) is (1,5,3)(7)(2,6). To 
find the image of 1, we take ten steps: 1 > 5 > 3 > 1---. Where does it stop after ten 
steps? Since three steps bring us back to where we started (because 1 is in a cycle of length 
three), nine steps take us around the cycle three times and the tenth takes us to 5. Thus 
1 > 5 in the tenth power. Similarly, 5 — 3 and 3 > 1. Clearly 7 — 7 regardless of the 
power. Ten steps take us around the cycle (2,6) exactly five times, so 2 > 2 and 6 > 6. 
Thus the tenth power is (1,5,3)(7)(2)(6). O 


Suppose we have a permutation in cycle form whose cycle lengths all divide k. The 
reasoning in the previous example shows that the kth power of that permutation will be the 
identity permutation; that is, all the cycles will be 1-long and so every element is mapped to 
itself (i.e., f(z) = x for all x). In particular, if we are considering permutations of an n-set, 
every cycle has length at most n and so we can take k = n!, regardless of the permutation. 
We have shown 


Theorem 1 (A fixed power of n-permutations is the identity) Given aset S, there 
are k > 0 depending on |S| such that f* is the identity permutation for every permutation 
f of S. Furthermore, k = |S|! is one such k. 


*Example 9 (Involutions) An involution is a permutation which is equal to its inverse. 
Since f(x) = f~'(x), we have f?(x) = f(f~'(x)) = x. Thus involutions are those permu- 
tations which have all their cycles of lengths one and two. How many involutions are there 
on n? Let’s count the involutions with exactly k 2-cycles and use the Rule of Sum to add 
up the results. We can build such an involution as follows: 


e Select 2k elements for the 2-cycles AND 
e partition these 2k elements into k blocks that are all of size 2 AND 
e put the remaining n — 2k elements into 1-cycles. 


Since there is just one 2-cycle on two given elements, we can interpret each block as 
2-cycle. This specifies f. The number of ways to carry out the first step is hae For the 
second step, we might try the multinomial coefficient ts eh 5) = (2k)!/2*. This is almost 


right! In using the multinomial coefficient, we’re assuming an ordering on the pairs even 
though they don’t have one. For example, with k = 3 and the set 6, there are just 15 


possible partitions as follows. 


{{1, 24, {8,45 (5, OFF 
{{1, 3h, (2,44, 5, 6FF 
{1,4}, (2, 3h, (5, 6FF 
{1 54, (2, 3h, (4, OFF 
{{1, 6f, {2, 34, {4,555 


{{1, 25, {3,55 (4, OFF 
{{1, 3h, {2,54 4, OFF 
{{1, 44, (2, 55, (3, 6FF 
{{1, 54, (2,45, (3, 645 
{{1, 6F, (2,45, (3, 555 


{{1, 24, (3, 6F, {4,555 
{{1, 3h, (2, 6F, (4, 555 
{1,4}, (2, 6F, 3, 555 
{{1, 5}, {2, 6F, (3, 455 
{{1, 6F, {2, 55, {3,455 


This is smaller than (, : >) = 6!/2!2!2! = 90 because all 3! ways to order the three blocks 


in each partition are counted differently to obtain the number 90. This is because we’ve 
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chosen a first, second and third block instead of simply dividing 6 into three blocks of size 
two. 


How can we solve the dilemma? Actually, the discussion of what went wrong contains 
the key to the solution: The multinomial coefficient counts ordered collections of k blocks 
and we want unordered collections. Since the blocks in a partition are all distinct, there 
are k! ways to order the blocks and so the multinomial coefficient counts each unordered 
collection k! times. Thus we must simply divide the multinomial coefficient by k!. If this 
dividing by k! bothers you, try looking at it this way. Let f(k) be the number of ways to 
carry out the second step, partition the 2k elements into k blocks that are all of size 2. 
Since the k blocks can be permuted in k! ways, the Rule of Product tells us that there are 
f(k) k! ways to select k ordered blocks of 2 elements each. Thus f(k) k! = aan 


Since there is just one way to carry out Step 3, the Rule of Product tells us that the 
number of involutions with exactly k 2-cycles is 


Simplifying and using the Rule of Sum to combine the various possible values of k, we 
obtain a formula for involutions. 


We have just proved: The number of involutions of n is 
[n/2] ' 
n! 


rare ee 2k)!2"k! 


where |n/2| denotes the largest integer less than or equal to n/2. Let’s use this to compute 
the number of involutions when n = 6. Since |6/2| = 3, the sum has four terms: 


6! 6! 6! 6! 
(6—0)! 290! G2! 2M! | Gal 2a!’ G6)! Bal 
6x5 6x5x4x3 6x5x4 
op kao eg 
1 Wiy ei = 6. 


= py ee 


The last term in the sum, namely k = 3 corresponds to those involutions with three 2-cycles 
(and hence no 1-cycles). Thus it counts the 15 partitions listed earlier in this example. O 


If you’re familiar with the basic operations associated with matrices, the following 
example gives a correspondence between matrix multiplication and composition of permu- 
tations. 


*Example 10 (Permutation matrices) Suppose f and g are permutations of n. We 


can define an n x n matrix F to consist of zeroes except that the (i, j)'® entry, F;,;, equals 


one whenever f(j) = 7%. Define G similarly. Then 


(@)ig => FixGng = Fags 
k=1 


55 


Functions 
since Gy; = 0 except when g(j) = k. By the definition of F’, this entry of F' is zero unless 
f(g(j)) =%. Thus (F'G);,; is zero unless (fg)(j) = 7%, in which case it is one. We’ve proven 
that FG corresponds to fg. In other words: 

Composition of permutations corresponds to multiplication of matrices. 
It is also easy to prove that f—! corresponds to F~!. Using this correspondence, we can 


prove things such as (fg)~! = g-' f7! and (f*)~! = (f7!)* by noting that they are true 
for matrices F and G. 


As an example, let f and g be the permutations 
ya(1 2345 oa a oe 
SAOe 4 A 5B oe NO de 5 
We computed fg in Example 6. We obtained 
wa (Ve AS 
ie ee ee 


Using our correspondence, we obtain 


0 1 0 0 0 0 0 0 0 1 
1 0 0 0 0 1 0 0 0 0 
F=]0 00 0 1 G=]0 1 0 0 0 
0 0 1 0 0 0 0 1 0 0 
0 0 0 1 =0 0 0 0 1 0 


You should multiply these two matrices together and verify that you get the matrix F'G 


corresponding to 
12 3 4 5 
fa= (4 4 5 3 an O 


*Example 11 (Derangements) A derangement is a permutation f with no fixed points; 
ie., f(x) A x for all x. We first show that the probability is 1/n that a permutation f, 
selected uniformly at random from all permutations of n, has f(k) =k. If f(k) =k, then 
the elements of n — {k} can be permuted in any fashion. This can be done in (n — 1)! ways. 
Thus, (n — 1)! is the cardinality of the set of all permutations of n that satisfy f(k) = k. 
Since there are n! permutations, the probability that f(k) = k is (n — 1)!/n! = 1/n. 
Hence the probability that f(k) # k is 1—1/n. If we toss a coin with probability p of 
heads for n tosses, the probability that no heads occurs in n tosses is (1 — p)". This is 
because each toss is “independent” of the prior tosses. If we, incorrectly, treat the n events 
f(1) 41,...,f(m) #7 as independent in this same sense, the probability that f(k) 4 k for 

=1,...,n, would be (1—1/n)”. One of the standard results in calculus is that (1—1/n)” 
approaches 1/e as n — oo. (You can prove it by writing (1—-1/n)" = exp(In(1—1/n)/(1/n)), 
setting 1/n = x and using |’Hopital’s Rule.) Thus, we might expect approximately n!/e 
derangements of n for large n. Although our argument is wrong, the result is right! We 
get partial credit for this example. 0 
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Exercises for Section 2 


2.1. 


2.2. 


2.3. 


2.4. 


This exercise lets you check your understanding of cycle form. A permutation is 
given in one-line, two-line or cycle form. Convert it to the other two forms. Give 
its inverse in all three forms. 


(c) (5,4,3,2,1), which is in one-line form. 


(d) (5,4,3,2,1), which is in cycle form. 


A carnival barker has four cups upside down in a row in front of him. He places 
a pea under the cup in the first position. He quickly interchanges the cups in the 
first and third positions, then the cups in the first and fourth positions and then 
the cups in the second and third positions. This entire set of interchanges is done 
a total of five times. Where is the pea? 

Hint: Write one entire set of interchanges as a permutation in cycle form. 


Let f be a permutation of n. The cycle of f that contains 1 is called the cycle 
generated by 1. 


(a) Prove that the number of permutations in which the cycle generated by 1 has 
length n is (n —1)!. 


(b) For how many permutations does the cycle generated by 1 have length k? (Re- 
member that a permutation must be defined on all elements of its domain n.) 


(c) If your answer to (b) is correct, when it is summed over 1 < k < n it should 
equal n!, the total number of permutations of n. Why? Show that your answer 
to (b) has the correct sum. 


This exercise deals with powers of permutations. All our permutations will be 
written in cycle form. 


(a) Compute (1, 2,3)°°. 

(b) Compute ((1,3)(2,5,4))> 

(c) Show that for every permutation f of 5, we have f® is the identity permutation. 
What is f°!? 
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Section 3: Other Combinatorial Aspects of Functions 


This section contains two independent parts. The first deals with the concept of the inverse 
of a general function. The second deals with functions related to computer storage of 
unordered lists. 


The Inverse of an Arbitrary Function 


Again, let f: 4 > B bea function. The image of f is the set of values f actually takes on: 
Image(f) = { f(a) |a € A}. The definition of a surjection can be rewritten Image(f) = B 
because a surjection was defined to be a function f : A — B such that, for every b € B 
there is an a € A with f(a) =). 


For each b € B, the inverse image of b, written f—+(b) is the set of those elements in A 
whose image is 0; i.e., 
f-*() ={a|ae€ Aand f(a) = 5}. 
This extends our earlier definition of f~! from bijections to all functions; however, such 
an f~+ can’t be thought of as a function from B to A unless f is a bijection because it will 
not give a unique a € A for each b € B.4 


Suppose f is given by the functional relation R C A x B. Then f~1(d) is all those a 
such that (a,b) € R. Equivalently, f~'() is all those a such that (b,a) € Ro}. 


Definition 4 (Coimage) Let f:A — B be a function. The collection of nonempty 
inverse images of elements of B is called the coimage of f. In mathematical notation 


Coimage(f) = {f-1(b) | b € B, f-*(b) 4 0} = {f-1(b) | b € Image(f)}. 


We claim that the coimage of f is the partition of A whose blocks? are the maximal 
subsets of A on which f is constant. For example, if f € {a,b,c}2 is given in one line form 
as (a,c, a,a,c), then 


Coimage(f) = {fast (} = {{1, 3, 4}, {2, 5}}, 
f is a on {1,3,4} and is c on {2,5}. 


We now prove the claim. If x € A, let y = f(x). Then x € f~'(y) and the set f~'(y) 
is an element of Coimage(f). Hence the union of the nonempty inverse images contains 
A. Clearly it does not contain anything which is not in A. If y; 4 yo, then we cannot 
have x € f~'(y,) and x € f~+(y2) because this would imply f(x) = y, and f(x) = yo, a 
contradiction of the definition of a function. Thus Coimage(f) is a partition of A. Since 
the value of f(a) determines the block to which x belongs, x; and x2 belong to the same 
block if and only if f(v1) = f(a2). Hence a block is a maximal set on which f is constant. 


“ There is a slight abuse of notation here: If f: A > B is a bijection, our new notation 
is f—1(b) = {a} and our old notation is f~!(b) =a. 

> Recall that a partition of a set S is an unordered collection of disjoint nonempty subsets 
of S whose union is S. These subsets are called the blocks of the partition. 
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Example 12 (f~' as a function) Let f:A— B bea function. For each b € B, 
f-*(0) ={a|ae€ Aand f(a) = db}. 


Thus, for each b € B, f~!(b) € P(A). Hence f~! is a function with domain B and range 
(codomain) P(A), the set of all subsets of A. This is true for any function f and does 
not require f to be bijection. For example, if f € {a,b,c}2 is given in one-line form as 
(a,c,a,a,c), then, f~+, in two-line notation is 


(usa 6 ea) 


If, we take the domain of f~! to be Image(f), instead of all of B, then f~! is a bijection 
from Image(f) to Coimage(f). In the case of our example (a,c, a,a,c), we get, in two-line 
notation 


Ge a 


for the image—coimage bijection associated with f—'. If we are only given the coimage of 
a function then we don’t have enough information to specify the function. For example, 
suppose we are given only that {{1,3,4},{2,5}} is the coimage of some function g with 
codomain {a,b,c}. We can see immediately that the domain of g is 5. But what is g? To 
specify g we need to know the elements x and y in {a,b,c} that make 


ee an 


esi 6 


the correct two-line description of g~* (restricted to its image). There are (3)2 = 6 choices 
for xy, namely, ab, ac, bc, ba, ca, and cb. In general, suppose f: A — B and we are given 
that a particular partition of A with k blocks is the coimage of f. Then, by comparison 
with our example (A = 5, B = {a,b,c}), it is easy to see that there are exactly (|B])x 
choices for the function f. OJ 


We can describe the image and coimage of a function by the arrow pictures introduced 
in Example 5. Image(f) is the set of those 6 € B which appear as labels of arrowheads. A 
block in Coimage(f) is the set of labels on the tails of those arrows that all have their heads 
pointing to the same value; for example, the block of Coimage(f) arising from b € Image(f) 
is the set of labels on the tails of those arrows pointing to b. 


Example 13 (Counting functions with specified image size) How many functions 
in B“ have an image with exactly k elements? You will need to recall that the symbol 
S(n,k), stands for the number of partitions of set of size n into k blocks. (The S(n,k) are 
called the Stirling numbers of the second kind and are discussed in Unit CL. See the index 
for page numbers.) If f € B4 has k elements in its image, then this means that the coimage 
of f is a partition of A having exactly k blocks. Suppose that |A| = a and |B| = b. There 


® Recall that (n), =n(n—1)-+-(n—k+1) =n!/(n—k)! is the number of k-lists without 
repeats that can be made from an n-set. 
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are S(a,k) ways to choose the blocks of the coimage. The partition of A does not fully 
specify a function f € B4. To complete the specification, we must specify the image of the 
elements in each block (Example 12). In other words, an injection from the set of k blocks 
to B must be specified. This is an ordered selection of size k without replacement from B. 
There are (b), = b!/(b — k)! such injections, independent of which k block partition of A 
we are considering. By the Rule of Product, there are $(a,k)(b), functions f € B4 with 
|Image(f)| = &. For example, when the domain is 5 and the range is {a,b,c}, the number 
of functions with |Image(f)| = 2 is $(5,2)(3)2 = 15 x 6 = 90, where the value of $(5, 2) 
was obtained from the table in the discussion of Stirling numbers in Unit CL. Example 12 
gave one of these 90 possibilities. 


We consider some special cases. 
e Suppose k = a. 


— If b <a, there are no functions f with |Image(f)| = a because the size a of the 
image is at most the size b of the codomain. 


— If b> a there are (b), functions with |Image(f)| = a. 


— If b=a, the previous formula, (b),, reduces to a! and the functions are injections 
from A to B. 


e Suppose k = b. 


— Ifb> a there are no functions f with |Image(f)| = 6 because the size of the image 
is at most the size of the domain. 


— If b<a then there are S(a, b)(b), = S(a,b) b! functions f € B4 with |Image(f)| = 
b. These functions are exactly the surjections. J 


Monotonic Lists and Unordered Lists 


In computers, all work with data structures requires that the parts of the data structure 
be ordered. The most common orders are arrays and linked lists. 


Sometimes the order relates directly to an order associated with the corresponding 
mathematical objects. For example, the one-line notation for a function is simply an ordered 
list, which is an array. Thus there is a simple correspondence (i.e., bijection) between lists 
and functions: A k-list from S is a function f:k + S. Thus functions (mathematical 
objects) are easily stored as ordered lists (computer objects). 


Sometimes the order is just an artifact of the algorithm using the structures. In other 
words, the order is imposed by the designer of the algorithm. Finding such a “canonical” 
ordering” is essential if one wants to work with unordered objects efficiently in a computer. 


7 Tn mathematics, people refer to a unique thing (or process or whatever) that has been 
selected as canonical. 
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Since sets and multisets® are basic unordered mathematical objects, it is important to 
have ways of representing them in a computer. We’ll discuss a canonical ordering for k-sets 
and k-multisets whose elements lie in an n-set. 


We need to think of a unique way to order the set or multiset, say 51, 52,...,5,% so that 
we have an ordered list. (A mathematician would probably speak of a canonical ordering 
of the multiset rather than a unique ordering; however, both terms are correct.) 


Let’s look at a small example, the 3-element multisets whose elements are chosen from 
5. Here are the Ca) = 35 such multisets.? An entry like 2,5,5 stands for the multiset 
containing one 2 and two 5’s. 


i> io “Alas Gea. ese. FS. 8 “hee he a8 
Loa ESS Aad PAR 1G 5 DOO. OBS. BA 905. 089 
DSA 235 OAA OAS DEH 833° 884 B35) Bad. 345 
3,5, 4,44 44,5 45,5 5,5,5 


We’ve simply arranged the elements in each 3-multiset to be in “weakly increasing order.” 
Let (b1,b2,..., 6%) be an ordered list. We say the list is in weakly increasing order if the 
values are not decreasing as we move from one element to the next; that is, if b1 < be < 
soe <p. The list of lists we’ve created can be thought of as a bijection from 


(i) the 3-multisets whose elements lie in 5 to 
(ii) the weakly increasing functions in 52 written in one-line notation. 


Thus, 3-multisets with elements in 5 correspond to weakly increasing functions in 5%. For 
example the multiset {2,5,5} corresponds to the weakly increasing function f = (2,5,5) in 
1-line form. 


Since we have seen that functions with domain k can be viewed as k-lists, we say that 
f € n® is a weakly increasing function if its one-line form is weakly increasing; that is, 
fC) < f(2) <---< f(k). Ina similar fashion we say that the list 61, b2,...,b, is in 


weakly decreasing by >bg >--- >bp; 
strictly decreasing order if by >bg >--+ >bx; 
strictly increasing by <bg <--+ <dp. 


Again, this leads to similar terminology for functions. All such functions are also called 
monotone functions. 


In the bijection we gave for our 35 lists, the lists without repetition correspond to the 
strictly increasing functions. Thus 3-subsets of 5 correspond to strictly increasing functions. 
From our previous list of multisets, we can read off these functions in 1-line form: 


(128) “Cl Oo Ay CL 2.5) Say Cle So) 
(1,4,5) (2,3,4) (2,3,5) (2,4,5) (3,4,5) 


We can interchange the strings “decreas” and “increas” in the previous paragraphs 
and read the functions in the list backwards. For example, the bijection between 3-subsets 
of 5 and strictly decreasing functions is given by 


291 204 “A ASO SOM HS 5380-5 At BAS 5 ALS: 


8 Recall that a multiset is like a set except that repeated elements are allowed. 
° A later example explains how we got this number. 
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The function (4,3, 1) corresponds to the 3-subset {4,3,1} = {1,3,4} of 5. 


All these things are special cases of the following 2-in-1 theorem. 


Theorem 2 (Sets, unordered lists and monotone functions) Read either the top 
lines in all the braces or the bottom lines in all the braces. There are bijections between 
each of the following: 
2 Naira 

k-sets 


weakly 
esune eee 


\ whose elements lie in n, 


} increasing ordered k-lists made from n, 
weakly | . . : do Sh 

e the ! increasing functions in né. 
strictly 


In these correspondences, the items in the list are the elements of the (multi)-set and are 
the values of the function. 
In the correspondences, “increasing” can be replaced by “decreasing.” 


For example, reading the top lines in the braces with k = 3 and n = 5 and, in the last 
one, replacing “increasing” with “decreasing,” we have: There are bijections between 


(a) 3-multisets whose elements lie in 5, 
(b) the weakly increasing ordered 3-lists in 5 and 
(c) the weakly decreasing functions in 5%. 


In these bijections, {2,5,5} corresponds to list (2,5,5) and the function f = (5,5, 2). 


Example 14 (Counting multisets) Earlier we said there were Ce) = 35 different 
3-element multisets whose elements come from 5 and gave the list 


baat TO. Ae AAS Ga. WO Oe. ae ee a 
134° 135) baa Lad. 1:55 209°. 99:3 994. B05 093 
23,4 2,35 2,44 245 2,55 33,3 3,34 33,5 3,44 3,4,5 
3,5,5 4,44 44,5 4,5,5 5,5,5 
How did we know this? To see the trick, do the following to each 3-list: 
e add 0 to the first item, 
e add 1 to the second item, and 


e add 2 to the third item. 
Thus the first ten become 


Loe. LOA. £O5: 196> 19 184. ES BG: WS? 1.45 


You should be able to see that you’ve created strictly increasing 3-lists from 7. In other 
words, you have listed all subsets of 7 of size 3. We know there are (3) = 35 such subsets 
and hence there were 35 multisets in the original list. In general, suppose we have listed 
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all weakly increasing k-lists from n. Suppose each such k-list is in weakly increasing order. 
If, as in the above example, we add the list 0,1,...,4 — 1 to each such k-list element by 
element, we get the strictly increasing k-lists from n+ k—1. By Theorem 2, this is a list 
of all k-subsets of n+k—1. Thus, the number of weakly increasing k — lists from n is 


Gr By Theorem 2, this is also the number of k-multisets from n. OJ 


Exercises for Section 3 


3.1. This exercise lets you check your understanding of the definitions. In each case 
below, some information about a function is given to you. Answer the following 
questions and give reasons for your answers: 


e Have you been given enough information to specify the function; i.e., would 
this be enough data for a function envelope? 


e Can you tell whether or not the function is an injection? a surjection? a 
bijection? If so, what is it? 


(a) f E48, Coimage(f) = {{1,3,5}, {2,4}}. 

(b) fe 58, Coimage(f) = {{1}, {2}. {3} {4}, {5}}. 
peas, (2) =(1,3,5}, $7) = {2,4}. 
fer. |Image(f)| =A, 

fed, |Image(f)| =5. 

fed. |Coimage(f’)| =p: 


3.2. Let A and B be finite sets and let f: A > B be a function. Prove the following 
claims. 


(a) |Image] = |Coimage(f)]|. 
(b) f is an injection if and only if |Image| = | A]. 
(c) f is a surjection if and only if |Coimage(f)| = |B]. 


3.3. In each case we regard the specified functions in one-line form. (As strings of 
integers, such functions are ordered lexicographically.) 


(a) List all strictly decreasing functions in 52 in lexicographic order. Note that 
this lists all subsets of size 3 from 5 in lexicographic order. 

(b) In the list for part (a), for each string x; x2x73 compute (Pe) + (=) + 
(re *) +1. What do these integers represent in terms of the list of part (a)? 


(c) List all strictly increasing functions in 5% in lexicographic order. Note that this 
also lists all subsets of size 3 from 5 in lexicographic order. 
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(d) What is the analog, for part (c) of the formula of part (b)? 
Hint: For each x1 x2 x3 in the list of part (c), form the list (6 — 71)(6 — x2)(6 
x3). 


(e) In the lexicographic list of all strictly decreasing functions in 92, find the suc- 
cessor and predecessor of 98321. 


(f) How many elements are there before 98321 in this list? 


3.4. We study the problem of listing all ways to put 5 balls (unlabeled) into 4 boxes 


3.5. 


(labeled 1 to 4). Consider ten consecutive points, labeled 0---9. Some points are 
to be converted to box boundaries, some to balls. Points 0 and 9 are always box 
boundaries (call these exterior box boundaries). From the remaining points labeled 
1---8, we can arbitrarily pick three points to convert to interior box boundaries, 
five points to convert to balls. Here are two examples: 


0123456789 points 0123456789 
ee ee e conversion e e eee 


1 2 3.é«4 box no. 1 2 8 4 


In this way, the placements of 5 balls into 4 boxes are made to correspond to subsets 
of size 3 (the box boundaries we can select) from 8. Lexicographic order on subsets 
of size 3 from 8, where the subsets are listed as strictly decreasing strings of length 
3 from 8, correspondingly lex orders all placements of 5 balls into three boxes. 


(a) Find the successor and predecessor of each of the above placements of balls 
into boxes. 


(b) In the lex order of 5 balls into 4 boxes, which placement of balls into boxes is 
the last one in the first half of the list? The first one in the second half of the 
list? 

Hint: The formula p(21, 72,73) = ey) + (75) + (ee) +1 gives the position 
of the string 212273 in the list of decreasing strings of length three from 8. Try 
to solve the equation p(x1, 72,73) = (5) /2 = 28 for the variables x1, x2, x3. 


Listing all of the partitions of an n set can be tricky and, of course, time consuming 
as there are lots of them. This exercise shows you a useful trick for small n. We 
define a class of functions called restricted growth functions that have the property 
that their collection of coimages is exactly the set of all partitions of n. 


(a) Call a function f € n® a restricted growth function if f(1) = 1 and f(z) —1 
is at most the maximum of f(k) over all k < i. Which of the following 
functions in one-line form are restricted growth functions? Give reasons for 
your answers. 


22. HO Ba “sh dae: (1 Bede 


(b) List, in lexicographic order, all restricted growth functions for n = 4. Use 
one-line form and, for each one, list its coimage partition. 
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(c) For n = 5, list in lexicographic order the first fifteen restricted growth func- 


tions. Use one-line form. For the functions in positions 5, 10, and 15, list their 
coimage partitions. 


3.6. How many functions f are there from 6 to 5 with |Image(f)| = 3? 


3.7. How many ways can 6 different balls be placed into 5 labeled cartons in such a way 
that exactly 2 of the cartons contain no balls? 


3.8. Count each of the following 
(a) the number of multisets of size 6 whose elements lie in {a, b,c, d}, 
(b) the number of weakly increasing functions from 6 to 4, 
(c) the number of weakly decreasing ordered 6-lists made from 4, 


(d) the number of strictly increasing functions from 6 to 9. 


Section 4: Functions and Probability 


In this section we look at various types of functions that occur in elementary probability 
theory. For the most part, we deal with finite sets. The functions we shall define are not 
difficult to understand, but they do have special names and terminology common to the 
subject of probability theory. We describe these various functions with a series of examples. 


Probability functions have already been encountered in our studies. To review this 
idea, let U be a finite sample space (that is, a finite set) and let P be a function from 
U to R such that P(t) > 0 for allt € U and Dey P(t) = 1. Then P is called a 
probability function on U. For any event E C U, define P(E) = °,<, P(t). P(E) is called 
the probability of the event E. The pair (U, P) is called a probability space. 
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Random Variables 


Consider tossing a coin. The result is either heads or tails, which we denote by H and 
T. Thus the sample space is {H,T}. Sometimes we want to associate numerical values 
with elements of the sample space. Such functions are called “random variables.” The 
function X : {H,T}— R, defined by X(H) = 1, X(T) = 0, is a random variable. Likewise 
the function Y(H) = 1, Y(T) = —1, same domain and range, is a random variable. 


As another example, consider the sample space U = x*{H,T}, which contains the 
possible results of four coin tosses. The random variable X (ti, te, t3,ta) = |{¢ | ti = H}| 
counts the number of times H appears in the sequence tj, t2,t3,t4. The function X has 
Image(X) = {0,1,2,3,4}, which is a subset of the set of real numbers. 


Definition 5 (Random variable) Let (U, P) be a probability space, and let g: U > R 
be a function with domain U and range (codomain) R, the real numbers. Such a function 
g is called a random variable on (U,P). The term “random variable” informs us that the 
range is the set of real numbers and that, in addition to the domain U, we also have a 
probability function P. 


Random variables are usually denoted by capital letters near the end of the alphabet. Thus, 
instead of g in the definition of random variable, most texts would use X. 


By combining the two concepts, random variable and probability function, we obtain 
one of the most important definitions in elementary probability theory, that of a distribution 
function. 


Definition 6 (Distribution function of a random variable) Let X :U—R bea 
random variable on a sample space U with probability function P. For each real number 
t € Image(X), let X~1(t) be the inverse image of t. Define a function fx : Image(X) > R 
by fx(t) = P(X~1(t)). The function fx is called the probability distribution function of 
the random variable X. The distribution function is also called the density function. 


Since P(X~1(t)) is the probability of the set of events E = {e | X(e) = t}, one often 
writes P(X = t) instead of P(X~1(t)). 


Example 15 (Some distribution functions) Suppose we roll a fair die. Then 
U = {1,2,3,4,5,6} and P(z) = 1/6 for all i. 


e If X(t) =¢, then fx(t) = P(X—1(t)) = P(t) = 1/6 for t = 1, 2,3, 4,5, 6. 


e If X(t) =0 when t is even and X(t) = 1 when t is odd, then 
fx (0) = P(X-"(0)) = P({2,4,6}) = 1/2 and fx(1) = 1/2. 


e If X(t) = —1 when t < 2 and X(t) =1 when t > 2, then 
fx(-1) = P({1,2}) = 1/3 and fx) = 2/3. O 


The function fx, in spite of its fancy sounding name, is nothing but a probability 
function on the set Image(X). Why is that? 
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e Since P is a nonnegative function with range R, so is fx. 


e Since Coimage(X) = {X~1!(t) | t € Image(X)} is a partition of the set U, 


1=PuH)= YL PXAwW= DY fel. 
t€Image(X) teImage(X ) 


Thus, fx is a nonnegative function from Image(X) to R such that 


S- fx(t) =1. 


telmage(X ) 


This is exactly the definition of a probability function on the set Image(X). 


Example 16 (Distribution of six cards with replacement) First one card and then 
a second card are selected at random, with replacement, from 6 cards numbered 1 to 6. 
The basic sample space is S x S = {(i,j): 1<i< 6,1 <j < 6}. Every point (7,7) in 
this sample space is viewed as equally likely: P(i,j7) = 1/36. Define a random variable X 
on S x S by X(i,7) =i+ J. S x S can be visualized as a 6 x 6 rectangular array and X 
can be represented by inserting X(i,7) in the position represented by row 7 and column j. 
This can be done as follows: 


It is evident that Image(X) = {2,3,4,5,6,7,8,9, 10, 11,12}. The blocks of the Coimage(X ) 
are the sets of pairs (7,7) for which i + 7 is constant. Thus, 


X~1(5) = {(1, 4), (2, 3), (3, 2), (4, 1)} 
and 
X~"(8) = {(2,6), (3,5), (4,4), (5,3), (6, 2)}. 


Since every point in S$ x S$ has probability 1/36, P(X~1(i)) = |X~1(i)|/36. Thus, with a 
little counting in the previous figure, the distribution function of X, fx = |X~1(i)|/36, in 
two-line form is 


nal? 3 4 5 6 7 8 9 1 i 12 q 
X~\ 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36) ° 


Suppose we have two random variables X and Y defined on the same sample space U. 
Since the range of both X and Y is the real numbers, it makes sense to add the two random 
variables to get a new random variable: Z = X + Y; that is, Z(t) = X(t) + Y(t) for all 
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t € U. Likewise, if r is a real number, then W = rX is a random variable, W(t) = rX(t) 
for all t € U. Thus we can do basic arithmetic with random variables. 


For any random variable, X on U, we have the following important definition: 


Definition 7 (Expectation of a random variable) Let X be a random variable on 
a sample space U with probability function P. The expectation E(X), or expected value of 
X, is defined by 

E(X)= 5° X(t)P(t). 


teU 


E(X) is often denoted by x and referred to as the mean of X. 


If we collect terms in preceding sum according to the value of X(t), we obtain another 
formula for the expectation: 


E(X) = 5° X(t)P(t) = Er( yS P(t) =Sorfx(r). 


teU r teU r 
X(t)=r 


The expectation E is a function whose arguments are functions. Another example of 
such a function is differentiation in calculus. Such functions, those whose arguments are 
themselves functions, are sometimes called “operators.” Sometimes you will see a statement 
such as £(2) = 2. Since the arguments of the expectation operator E are functions, the 
“2” inside the parentheses is interpreted as the constant function whose value is 2 on all of 
U. The second 2 in E(2) = 2 is the number 2. 


If X and Y are two random variables on a sample space U with probability function 
P,and Z=X+Y, then 


=S>X(t)P(t)+ >) Y(t)P(t) = E(X) + B(Y). 


Similarly, for any real number r, E(rX) = rE(X). Putting these observations together, we 
have proved the following theorem: 


Theorem 3 (Linearity of expectation) If X and Y are two random variables on 
a sample space U with probability function P and if a and 6 are real numbers, then 
E(aX + bY) =akE(X)+0E(Y). 


We now introduce some additional functions defined on random variables, the covari- 
ance, variance, standard deviation, and correlation. 


Definition 8 (Covariance, variance, standard deviation, correlation) Let U bea 
sample space with probability function P, and let X and Y be random variables on U. 
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e Then the covariance of X and Y, Cov(X,Y), is defined by 
Cov(X,Y) = E(XY) — E(X)E(Y). 


e The variance of X, Var(X) (also denoted by o%), is defined by 
Var(X) = Cov(X, X) = E(X*) — (E(X))?. 


e The standard deviation of X is ox = (Var(X))!/?. 


e Finally, the correlation, p(X,Y) of X and Y is p(X,Y) = Cov(X,Y)/oxoy, 
provided ox #0 and oy #0. 


Example 17 (Sampling from a probability space) Consider the probability space 
(U, P) consisting of all possible outcomes from tossing a fair coin three times. Let the 
random variable X be the number of heads. Suppose we now actually toss a fair coin three 
times and record the result. This is called “sampling from the distribution.” Given our 
sample e € U, we can compute X(e). Suppose we sample many times and average the 
values of X(e) that we compute. As we increase the number of samples, our average moves 
around; e.g., if our values of X(e) are 1,2,0,1,2,3,... our averages of the first one, first 
two, first three, etc., are 1,3/2,3/3, 4/4, 6/5, 9/6,..., which reduce to 1,1.5,1,1,1.2,1.5,.... 
What can we say about the average? When we take a large number of samples the average 
will tend to be close to fsx, which is 1.5 in this case. Thus without knowing P we can 
estimate jz by sampling many times computing X and averaging the results. 


The same idea applies to other functions of random variables: Sample many times, 
compute the function of the random variable(s) for each sample and average the results. In 
this way, we can estimate E(X7). In the previous paragraph, we estimated E(X). Combin- 
ing these estimates, we obtain an estimate for Var(X) = E(X?) — (E(X))?. Refinements 
of this procedure are discussed in statistics classes. OJ 


Theorem 4 (General properties of covariance and variance) Let X,Y and Z be 
random variables on a probability space (U,P) and let a, b and c be real numbers. The 
covariance and variance satisfy the following properties: 


(1) (symmetry) Cov(X,Y) = Cov(Y, X) 


(2) (bilinearity) Cov(aX + bY, Z) = aCov(X, Z) + bCov(Y, Z) and 
Cov(X, aY + 6Z) = aCov(X,Y) + bCov(X, Z). 


Thinking of a as the constant function equal to a and likewise for b, we have 
(3) Cov(a,X)=0 and Cov(X +a,Y +b) = Cov(X,Y). 
In particular, Cov(X,Y) = E((X — ux)(Y — py)) and Var(X) = E((X — ux)?). 


(4) Var(aX + bY +c) = Var(aX + bY) 
= a’Var(X) + 2abCov(X,Y) + b?Var(Y). 


The last two formulas in (3) are sometimes taken as the definition of the covariance and 
variance. In that case, the formulas for them in Definition 8 would be proved as a theorem. 
Note that the formula Var(X) = E((X — x )”) says that Var(X) tells us something about 
how far X is likely to be from its mean. 
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Proof: Property (1) follows immediately from the fact that XY = YX and E(X)E(Y) = 
E(Y)E(X). The following calculations prove (2). 


Cov(aX + bY, Z) = E((aX + bY)Z) — E(aX + bY)E(Z) by definition 
= aE(XZ) + bE(YZ) — (aE(X)E(Z) + bE(Y)E(Z)) by Theorem 3 
= a(E(XZ) — E(X)E(Z)) + b(E(YZ) — E(Y)E(Z)) 
= aCov(X, Z) + bCov(Y, Z) by definition. 
We now turn to (3). By definition, Cov(a, X) = E(aX) — E(a)E(X). Since E(aX) = 
ak(X) and E(a) = a, Cov(a, X) = 0. Using the various parts of the theorem, 
Cov(X +a, Y + b) = Cov(X,Y + 6) + Cov(a, Y + b) = Cov(X,Y + b) 
= Cov(X, Y) + Cov(X, b) = Cov(X,Y). 
The particular results follow when we set a = —ywx and b= —py. 


You should be able to prove (4) by using (1), (2), (3) and the definition Var(Z) = 
Cov(Z,Z). O 


Example 18 (General properties of correlation) Recall that the correlation of X 
and Y is p(X, Y) = Cov(X,Y)/oxoy, provided o, #0 and cy #0. Since 
Cov(X +c¢,Y +d) =Cov(X,Y) and Var(X +c) = Var(X) 


for any constant functions c and d, we have p(X +c,Y + d) = p(X,Y) for any constant 
functions c and d. Suppose that X and Y both have mean zero. Note that 


0< E((X +tY)*) = E(X*) + 2E(XY)t+ E(Y*)?? = f(t) 


defines a nonnegative polynomial of degree 2 in t. From high school math or from calculus, 
the minimum value of a polynomial of the form A+ 2Bt+ Ct? (C > 0) is 


A—B?/C = E(X*) -— (E(XY))*/E(v”). 


Since for all t, f(t) > 0, we have E(X?) — (E(XY))?/E(Y2) > 0. Since X and Y have 
mean Zero, 


Cov(X,Y) = E(XY), Var(X) = E(X*), and Var(Y) = E(Y?). 


Thus, 
(E(XY))/E(X*)E(Y") = (o(X,Y))? <1 


or, equivalently, —1 < p(X,Y) < +1. If the means of X and Y are not zero then replace 
X and Y by X — px and Y — py respectively, to obtain —1 < p(X,Y)< +1. O 


We have just proved the following theorem about the correlation of random variables. 


Theorem 5 (Bounds on the correlation of two random variables) Let X and 
Y be random variables on a sample space U and let p(X,Y) be their correlation. Then 
—1< p(X,Y) < 41. 


70 


Section 4: Functions and Probability 


The intuitive interpretation of the correlation p(X,Y) is that values close to 1 mean 
points in U where X is large also tend to have Y large. Values close to —1 mean the 
opposite. As extreme examples, take Y=X. Then p(X,Y) = 1. If we take Y = —X then 
p(X,Y) = —1. 


Tchebycheff ’s inequality!® is another easily proved inequality for random variables. It 
relates the tendency of a random variable to be far from its mean to the size of its variance. 
Such results are said to be “measures of central tendency.” 


Theorem 6 (Tchebycheff’s inequality) | Let X bea random variable on a probability 
space (U, P). Suppose that E(X) = py and Var(X) = 0”. Let € > 0 be a real number. Then 


P({u| 1X(u)- n> 4) < &. 


The left side of the inequality contains the set of all u for which |X(u) — | > €. Thus it 
can be thought of as the probability that the random variable X satisfies |X — | > e. 


The most important aspect of Tchebycheff’s inequality is the universality of its applicability: 
the random variable X is arbitrary. 


Proof: Let’s look carefully at the computation of the variance: 


Var(X) = E[(X-p)"] = D> (XW)-HP Pu) + YS (X(u)- 4)? Plu). 


fu] |X—pl]2¢} {u| |X—p]<e} 
In breaking down the variance into these two sums, we have partitioned U into two disjoint 


sets {u | |X — p| > ec} and {u | |X — p| < e€}. Since all terms are positive, Var(X) is 
greater than or equal to either one of the above sums. In particular, 


Var(X) = B(X—p)"] > SD (X(u)— w)? Pw). 


{u| |X—p|2e} 


Note that 


YE aw-w7Pw = eC YL Pw) = @P(tu| x-al2 4). 


{ul |X—p|>¢} {u| |X—p| De} 
Putting all of this together proves the theorem. [J 


10 Tchebycheff is also spelled Chebyshev, depending on the system used for transliterating 
Russian. 
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Joint Distributions 
A useful concept for working with pairs of random variables is the joint distribution function: 


Definition 9 (Joint distribution function) Let X and Y be random variables on 
a sample space U with probability function P. For each (i,j) € Image(X) x Image(Y), 
define hx y (i,j) = P(X~'(t) N Y~1(j)). The function hx,y is called the joint distribution 
function of X and Y. 


Recalling the meaning of the distribution functions fy and fy (Definition 6) you should 
be able to see that 


fx@= Yo bxylij) and fyG)= So hxyli,3). 


j€lmage(Y ) i€Image(X ) 


Example 19 (A joint distribution for coin tosses) A fair coin is tossed three times, 
recording A if heads, T if tails. The sample space is 


U = {HHH, HHT, HTH, HTT,THH,THT,TTH,TTT}. 


Let X be the random variable defined by X(titot3) = 0 if t) = T, X(titet3) = 1 if 
t, = H. Let Y be the random variable that counts the number of times J occurs in 
the three tosses. Image(X) x Image(Y) = {0,1} x {0,1,2,3}. We compute hx (0,2). 
X-1(0) ={THH,THT,TTH,TTT}. Y-1(2) = {HTT, THT, TTH}. X-1(0)NY-1(2) = 
{THT,TTH}. Thus, hx (0,2) = 2/8. Computing the rest of the values of hx y, we can 
represent the results in the following table: 


Y=0 Y=1 Y=2 Y=3 Ea 


X=0] 0 1/8 2/8 1/8 | 1/2 


X=1 | 1/8 2/8 1/8 oO | 1/2 


1/8 3/8 3/8 1/8 


In this table, the values of hx y (i,j) are contained in the submatrix 


0 1/8 2/8 1/8 
1/8 2/8 1/8 0 


The last column gives the distribution function fx and the last row gives the distribu- 
tion function fy. These distributions are called the marginal distributions of the joint 
distribution hx,y. You should check that E(X) = 1/2, E(Y) = 3/2, Var(X) = 1/4, 
Var(Y) = 3/4. Compute also that E(XY) = )Cijhx y(i,j) = 1/2, where the sum is 
over (i,j) € {0,1} x {0,1,2,3}. Thus, Cov(X,Y) = —1/4. Putting this all together gives 
p(X,Y) =—37'/?. It should be no surprise that the correlation is negative. If X is “large” 
(ie., 1) then Y should be “small,” since the first toss came up H, making the total number 
of T’s at most 2. OJ 
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Example 20 (Another joint distribution for coin tosses) As before, a fair coin is 
tossed three times, recording H if heads, T if tails. The sample space is 


U = {HHH, HHT, HTH, HTT,THH,THT,TTH,TTT}. 


Again, let X be the random variable defined by X(titgt3) = 0 if t) = T, X(titet3) = 1 
if t, = H. However, now let Y be the random variable that counts the number of times 
T occurs in the last two tosses. Image(X) x Image(Y) = {0,1} x {0,1,2}. We compute 
hx.y(0,2). X71(0) = {THH,THT,TTH,TTT}. Y-1(2) = {HTT,TTT}. X-1(0)N 
Y—1(2) = {TTT}. Thus, hx (0,2) = 1/8. Computing the rest of the values of hx,y, we 
can represent the results in the following table: 


1/8 2/8 1/8 | 1/2 


1/8 2/8 1/8 | 1/2 
1/2 1/4 


You should compute that E(X) = 1/2, E(Y) = 1, Var(X) = 1/4, Var(Y) = 1/2 and 
Cov(X,Y) = 0. Since all the previous numbers were nonzero, it is rather surprising that 
the covariance is zero. This is a consequence of “independence,” which we will study 
next. OJ 


Independence 


If the expectation of the sum of random variables is the sum of the expectations, then 
maybe the expectation of the product is the product of the expectations? Not so. We’ll 
look a simple example with Y(T) = X(T). Let U = {H,T} and P(H) = P(T) = 1/2 and 
let X(T) =0 and X(H) = 1 be a random variable on (U, P). Then E(X) = X(T)(1/2) + 
X(H)(1/2) = 1/2. Since X2 = X, we have E(XX) = E(X?) = E(X) = 1/2. This does 
not equal E(X)?. In order to give some general sufficient conditions for when E(XY) = 
E(X)E(Y) we need the following definition. 


Definition 10 (Independence) Let (U, P) be a probability space. 


e If ACU and B CU are events (subsets) of U such that P(AN B) = P(A)P(B). 
Then A and B are said to be a pair of independent events. 


e If X and Y are random variables on U such that 


for all s € Image(X) and all t € Image(Y), 
X~'(s) and Y~'(t) are a pair of independent events, 


then X and Y are said to be a pair of independent random variables. 
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The definition of independence for events and random variables sounds a bit technical. 
In practice we will use independence in a very intuitive way. 


Recall the definitions of the marginal and joint distributions fy, fy and hx,y in 
Definition 9. Using that notation, the definition of independence can be rephrased as 


X and Y are independent random variables 
if and only if 
hx,y(t,J) = fx()fy() for all (7,7) € Image(X) x Image(Y). 


You should verify that X and Y are independent in Example 20. Intuitively, knowing what 
happens on the first toss gives us no information about the second and third tosses. We 
explore this a bit in the next example. 


Example 21 (Independent coin tosses) Suppose a fair coin is tossed twice, one after 
the other. In everyday language, we think of the tosses as being independent. We’ll see 
that this agrees with our mathematical definition of independence. 


The sample space is U = {(H, H),(H,T),(T,H),(T,T)}. Note that U = {H,T} x 
{H,T} and p is the uniform probability function on U; i.e., P(e) = 1/4 for each e € U. 


Let A be the event that the first toss is H and let B be the event that the second is H. 
Thus A = {HH,HT} and B = {HH,TH}. You should be able to see that P(A) = 1/2, 
P(B) =1/2 and 

PAB) =P Haya 14 = 0724172) 


and so A and B are independent. 
What about independent random variables? Let 


0, ift; =T7, 
X(t) = 44 eon 


and 
ey (MOR: ait ads 
¥(t,t2) = { 1. Et oca aT: 
Thus X “looks at” just the first toss and Y “looks at” just the second. You should be able 
to verify that X—1(1) = A, X—1(0) = A°, Y-1(1) = B and Y—1(0) = B®. To see that 
X and Y are independent, we must verify that each of the following 4 pairs of events is 


independent 
Aand B A and B° A° and B A° and B*. 


In the previous paragraph, we saw that A and B are independent. You should be able to 
do the other 3. 


This seems like a lot of work to verify the mathematical notion of independence, 
compared with the obvious intuitive notion. Why bother? There are two reasons. First, 
we want to see that the two notions of independence are the same. Second, we can’t do 
any calculations with an intuitive notion, but the mathematical definition will allow us to 
obtain useful results. OJ 


The preceding example can be generalized considerably. The result is an important 
method for building up new probability spaces from old ones. 
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*Example 22 (Product spaces and independence) Let (U;, P,) and (U2, P2) be prob- 
ability spaces. Define the product space (U, P) by 


U =U, x U2 (Cartesian product) 
P(e,,€2) = Py(e1) x Po(e2) ~— (multiplication of numbers) 


Suppose A = A, x Us, and B = U, x Bo. We claim that A and B are independent events. 
Before proving this, let’s see how it relates to the previous example. 


Suppose U; = U2 = {H,T} and that P, and Py» are the uniform probability functions. 
Then (U;, P,) describes the first toss and (U2, P2) describes the second. Also, you should 
check that (U, P) is the same probability space as in the previous example. Check that, if 
A; = {H} and By = {H}, then A and B are the same as in the previous example. 


We now prove that A and B are independent in our general setting. We have 


RA) = Ss P(e) definition of P(A) 

ecA 

= se P(e) definition of A 
e€ A; xU2 

= Ss" P(e1, €2) definition of Cartesian product 
e1€ Ai 
e2€U2 

= y:; P,(e1) Po(e2) definition of P 
e1€ Ai 
e2€U2 

= ( S- Pi(er)) x ( S- Paes) algebra 

e1€ Ay e2€U2 

= S- Py(e1) x 1 definition of probability 
e1€ Ai 

= P\(Aj) definition of P(A). 


Similarly, P(B) = P2(B2). You should verify that AN B = A; x By. By doing calculations 
similar to what we did for P(A), you should show that P(A, x Bz) = P,(A1)P2(B2). This 
proves independence. 


Suppose X is a random variable such that X(e1,e2) depends only on e;. What does 
X~'(r) look like? Suppose X(e1,e2) = r so that (e1,e2) € X~'(r). Since X does not 
depend on e€2, X(e€1,u2) = 7 for all ug € Uz. Thus fe,} x Uz C X~(r). Proceeding in this 
way, one can show that X~!(r) = A, x U2 for some A; C Uj. 


Suppose Y is a random variable such that Y (e1, e2) depends only on e2. We then have 
Y~-! = U, x By for some By C U2. By our earlier work in this example, it follows that 
X~1(r) and Y~!(s) are independent events and so X and Y are independent. 


What made this work? X “looked at” just the first component and Y “looked at” just 
the second. 


This can be generalized to the product of any number of probability spaces. Random 
variables X and Y will be independent if the components that X “looks at” are disjoint 
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from the set of components that Y “looks at.” For example, suppose a coin is tossed 
“independently” 20 times. Let X count the number of heads in the first 10 tosses and let 
Y count the number of tails in the last 5 tosses. [J 


We now return to E(XY), with the assumption that X and Y are independent random 
variables on a sample space U with probability function P. We also look at variance, 
covariance, and correlation. 


Theorem 7 (Properties of independent random variables) Suppose that U is 
a sample space with probability function P and that X and Y are independent random 
variables on U. Then the following are true 


© E(XY) = E(X)E(Y), 

Cov( X,Y) =0 and p(X, ¥) =, 

e Var(X + Y) = Var(X) + Var(Y), 

if f,g: RR, then f(X) and g(Y) are independent random variables. 


Proof: First note that there are two ways to compute E(Z) for a random variable Z: 


BZ) = Ss" Z(u)P(u) (the first way) 
ueU 
= x kP(Z—'(k)) = > kfz(k) (the second way). 
keImage(Z) keImage(Z) 


For Z = XY, we use the second way. If Z = k, then X =i and Y = 7 for some? and j 
with 77 = k. Thus 
E(Z)= S> kP(Z~*(k)) 
keImage(Z) 
= So Px @ny@). 
i€Image(X) 
jelmage(Y ) 


From the definition of independence, P(X~1(i) N Y~+(j)) = P(X71())P(Y~1(j)), and 
hence 


Y GPIXT@MAYG))= DY) yP(X*@) PY") 


i€Image(X) i€Image(X ) 
jelmage(Y ) jElmage(Y) 
=( ¥ ert@)«(  sPr'@). 
i€Image(X) jEImage(Y ) 


The right hand side of the above equation is just E(X)E(Y). This proves the first part of 
the theorem. 


By Definition 8 and the fact that we have just proved E(XY) = E(X)E(Y), it follows 
that Cov(X,Y) = 0. It then follows from Definition 8 that p(X,Y) = 0. You should use 
some of the results in Theorem 4 to show that Var(X + Y) = Var(X) + Var(Y). 


We omit the proof of the last part of the theorem; however, you should note that f(X) 
is a function on U because, for u € U, f(X)(u) = f(X(u)). O 
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*Example 23 (Generating random permutations) Suppose we want to generate 
permutations of n randomly so that each permutation is equally likely to occur. How can 
we do so? We now look at a simple, efficient method for doing this. The procedure is based 
on a bijection between two sets: 


e Seq,,, the set of all sequences a2, a3,...,@,, of integers with 1 < a, < k for all k and 
e Perm,, the set of all permutations of n = {1,2,...,n}. 


We can tell there must be such a bijection even without giving one! Why is this? Since 
there are k choices for ax, the number of sequences a2,a3,...,@n is 2X3 xX---xn=n!. We 
know that there are n! permutations. Thus |Seq,,| = |Perm,|. Because Seq,, and Perm, 
have the same size, there must be a bijection between them. Since there’s a bijection, 
what’s the problem? The problem is to find one that is easy to use. 


Before providing the bijection, let’s look at how to use it to generate permutations 
uniformly at random. It is easy to generate the sequences uniformly at random: Choose a, 
uniformly at random from k and choose each of the a, independently. This makes Seq,, into 
a probability space with the uniform probability distribution. Once we have a sequence, 
we use the bijection to construct the permutation that corresponds to the sequence. This 
makes Perm, into a probability space with the uniform probability distribution. 


Now we want to specify a bijection. There are lots of choices. (You should be able to 
show that there are (n!)! bijections.) Here’s one that is easy to use: 


Step 1. Write out the sequence 1,2,...,n and set k = 2 


Step 2. If a, #k, swap the elements in positions k and a, in the sequence. If a, = k, do 
nothing. 


Step 3. If k < n, increase k by 1 and go to Step 2. If k = n, stop. 
The result is a permutation in one-line form. 


For example, suppose n = 5 and the sequence of a,’s is 2,1,3,3. Here’s what happens, 
where “next step” tells which step to use to produce the permutation on the next line: 


action permutation information next step 
the start (Step 1) 1,2,3,4,5 k=2,a, =2 Step 3 
do nothing 1,2,3,4,5 =3,a,=1 Step 2 
swap at 3 and 1 3,2,1,4,5 k=4, ap =3 Step 2 
swap at 4 and 3 3,2,4,1,5 k=5, ap =3 Step 2 
swap at 5 and 3 3,2,5,1,4 all done 


Thus, the sequence 2,1,3,3 corresponds to the permutation 3, 2,5, 1,4. 


How can we prove that this is a bijection? We’ve described a function F’ from Seq,, 
to Perm,,. Since |Seq,,| = |Perm,,|, we can prove that F' is a bijection if we can show that 
it is a surjection. (This is just Exercise 1.2(c).) In other words, we want to show that, for 
every permutation p in Perm,,, there is a sequence a in Seq,, such that F(a) = p. 


Let’s try an example. Suppose we have the permutation 4,1,3,2,6,5. What sequence 
does it come from? This is a permutation of 6. The only way 6 could move from the last 
place is because ag # 6. In fact, since 6 is in the fifth place, we must have had ag = 5. 
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Which caused us to swap the fifth and sixth positions. So, just before we used ag = 5, 
we had the permutation 4,1,3,2,5,6. None of ag,...,a@5 can move what’s in position 6 
since they are all less than 6, so az2,...,a@5 must have rearranged 1,2,3,4,5,(6) to give 
4,1,3,2,5, (6). (We’ve put 6 in parentheses to remember that it’s there and that none of 
@2,...,@5 can move it.) How did this happen? Well, only as could have affected position 
5. Since 5 is there, it didn’t move and so as = 5. Now we’re back to 4,1,3,2,(5,6) and 
trying to find a4. Since 4 is in the first position, ag = 1. So, just before using, a4 we 
had 2,1,3,(4,5,6). Thus a3 = 3 and we’re back to 2,1, (3,4,5,6). Finally ag = 1. We’ve 
found that the sequence 1,3,1,5,5 gives the permutation 4,1,3,2,6,5. You should apply 
the algorithm described earlier in Steps 1, 2, and 3 and so see for yourself that the sequence 
gives the permutation. 


The idea in the previous paragraph can be used to give a proof by induction on n. For 
those of you who would like to see it, here’s the proof. The fact that F’ is a surjection is 
easily checked for n = 2: There are two sequences, namely 1 and 2. These correspond to the 
two permutations 2,1 and 1,2, respectively. Suppose n > 2 and pj,..., pny is a permutation 
of n. We need to find the position of n in the permutation. The position is that k for which 
pr =n. So we set a, = k and define a new permutation q1,...,@n—1 of {1,2,...,2—1} to 
correspond to the situation just before using a, = k: 


e Ifk=n, then gq; = p; for 1 <i<n-1. 
e If kA#An, the q, = pn and q = pj forl<i<kandfork<i<n-1. 


You should be able to see that q1,...,@n—1 is a permutation of n — 1. By induction, there 
is a sequence ag,...,@n—1 that gives gi,...,@n—1 when we apply our 3-step procedure to 
1,2,3,...,(n.—1). After that, we must apply a, = k to m,...,@n—1,n. What happens? 
You should be able to see that it gives us p1,...,~n. This completes the proof. O 


Some Standard Distributions 


We now take a look at some examples of random variables and their distributions that 
occur often in applications. The first such distribution is the binomial distribution. 


Example 24 (Binomial distribution) Suppose we toss a coin, sequentially and inde- 
pendently, n times, recording H for heads and T for tails. Suppose the probability of H in 
a single toss of the coin is p. Define 


iris ce Ds ift = H, 
POT te iy aces 


Our sample space is U = x"{H,T} and the probability function P is given by P(ti,...,tn) = 
P*(t,)---P*(tn) because of independence. This is an example of a product space. We dis- 
cussed product spaces in Example 22. 


Define the random variable X(t1,...,t,) to be the number of H’s in the sequence 
(ti,...,tn). This is a standard example of a binomial random variable. 
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We want to compute P(X = k) for k € R. Note that Image(X) = {0,...,n}. 
Hence P(x = k) = 0 if k is not in {0,...,n}. Note that (ti,...,tn) € X~+(k) if and 
only if (t1,...,tn) contains exactly k heads (H’s). In this case, P(t,,...,tn) = p*q”~*. 
Since all elements of X~1(k) have the same probability p*q"~", it follows that fx(k) = 
|X—1(k)| p*q"—*. What is the value of |X~1(k)|. It is the number of sequences with ex- 
actly & heads. Since the positions for k heads must be chosen from among the n tosses, 
|X—'(k)| = (). Thus fx(k) = (2)p*q"-*. This is the binomial distribution function. A 
common alternative notation for this distribution function is b(k;n,p). This notation has 
the advantage of explicitly referencing the parameters, n and p. 


An alternative way of thinking about the random variable X is to write it as a sum, 
X = X,4+---+Xz,, of n independent random variables. The random variable X; is defined 
on the sample space U = x"{H,T} by the rule 


1, ift; =H, 
Kilda ++ ytn) = e if t; = T. 


Using this representation of X, we can compute E(X) = E(X,) +--- + E(X,), and 
Var(X) = Var(X1) +---+ Var(X,,). Computation gives 


E(X;) =1x P(X; =1)+0~x P(X; =0)=p 
and 
Var(X;) = E(X?) — E(X;)? =p—p? = p(1—p), 


where we have used X? = X; because X; must be 0 or 1. Thus, we obtain E(X) = np and 
Var(X) = np(1—p) = npg. O 


Of course, the binomial distribution is not restricted to coin tosses, but is defined for 
any series of outcomes that 


e are restricted to two possibilities, 
e are independent, and 
e have a fixed probability p of one outcome, 1 — p of the other outcome. 


Our next example is a random variable X that is defined on a countably infinite sample 
space U. This distribution, the Poisson, is associated with random distributions of objects. 


Example 25 (Poisson distribution and its properties) Suppose a 500 page book 
has 2,000 misprints. If the misprints are distributed randomly, what is the probability of 
exactly k misprints appearing on page 95? (We want the answers for k = 0,1,2,....) 


Imagine that the misprints are all in a bag. When we take out a misprint, it appears 
on page 95 with probability 1/500. Call the case in which a misprint appears on page 95 a 
“success” and the case when it does not a “failure.” We have just seen that, for a randomly 
selected misprint, the probability of success is p = 1/500. Since we have assumed the 
misprints are independent, we can use the binomial distribution. Our answer is therefore 
that the probability of exactly k misprints on page 95 is b(k; 2000, 1/500). 
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Thus we have our answer: b(k; 2000, 1/500) = ow (1/500)*(1 — 1/500)?00°-*, Unfor- 
tunately, its hard to use: for large numbers the binomial distribution is awkward to work 
with because there is a lot of calculation involved and numbers can be very large or very 
small. Can we get a more convenient answer? Yes. There is a nice approximation which 
we will now discuss. 


The function fx(k) = ae is also denoted by p(k; A) and is called the Poisson 


e€ 
distribution. Clearly the p(k; A) are positive. Also, they sum to one: 
co Co 
AF Ae 
SAS ty > 7 a 
> € “a = € an e “ec=1. 
k=0 k=0 


We have used the Taylor Series expansion, obtained in calculus courses, 77" , 
a similar manner, it can be shown that 


E(X)=A and Var(X) =). 


AF 
= =e*. In 


Thus, a Poisson distributed random variable X has the remarkable property that E(X) = A 
and Var(X) = where \ > 0 is the parameter in the distribution function 
PL ah Spee 4" e. 

We now return to our binomial distribution b(k; 2000, 1/500). The Poisson is a good 
approximation to b(k;n,p) when n is large and np is not large. In this case, take \ = np, 
the mean of the binomial distribution. For our problem, A = 2000(1/500) = 4, which is not 
large when compared to the other numbers in the problem, namely 2,000 and 500. Let’s 
compute some estimates for P;,, the probability of exactly k errors on page 95. 


Py = e~* = 0.0183, P, = 4e~* = 0.0733, P3 = 4°e~*/3! = 0.1954, 


and soon. 0 


Our final example of a random variable X has its underlying sample space U = R, 
the real numbers. Rather than starting with a description of X itself, we start with the 
distribution function fx (x) = dy,¢(x), called the normal distribution function with mean 
pe and standard deviation o. 


1 


oV2n 


Pa Ges . 


Py,o (2) = 


For computations concerning the normal distribution, it suffices in most problems, to work 
with the special case when uw = 0 and o = 1. In this case, we use the notation 


o(x) = 7 eel 


where $(2) = ¢0,1() is called the standard normal distribution. 


The function ¢(x) is defined for —oco < x < oo and is symmetric about x = 0. The 
maximum of (a) occurs at x = 0 and is about 0.4. Here is a graph of ¢(x) for —2 < a2 <t: 
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In this graph of ¢(x) shown above, the area between the curve and the interval from 0 to 
t on the x-axis is shaded. This area, as we shall discuss below, represents the probability 
that a random variable with distribution function @ lies between 0 and t. For t = 1 the 
probability is about 0.34, for tf = 1.5 the probability is about 0.43, and for t = 2, the 
probability is about 0.48. Since this is a probability distribution, the area under the whole 
curve is 1. Also, since the curve is symmetric, the area for x < 0 is 1/2. We’ll use these 
values in the examples and problems, so you will want to refer back to them. 


Example 26 (The normal distribution and probabilities) The way the normal 
curve relates to probability is more subtle than in the finite or discrete case. If a random 
variable X has ¢$y,¢(x) = ee? —>*)’ as its distribution function then we compute the 
probability of any event of the form [a,b] = {x | a < x < b} by computing the area under 
the curve ¢,,¢(x) and above the interval [a,b]. 


How can we compute this area? Tables and computer programs for areas below 
y = $(@) are available. Unfortunately ¢,,, and @ are different functions unless 4 = 0 
and 0 = 1. Fortunately, there is a simple recipe for converting one to the other. Let 
h(t) = (t— p)/o. The area below ¢,,,(x) above the interval [a,b] equals the area below ¢ 
above the interval [h(a), h(b)]. 


A farmer weighs some oranges from his crop and comes to you for help. From his data 
you notice that the mean weight is 8 ounces and the standard deviation is 0.67 ounces. 
You’ve read somewhere (Was it here?) that for such things a normal distribution is a good 
approximation to the weight. The farmer can sell oranges that weigh at least 9 ounces at 
a higher price per ounce, so he wants to estimate what fraction of his crop weighs at least 
9 ounces. Using our recipe, h(9) = (9 — 8)/0.67 = 1.5. We know that the area under ¢(z) 
for the interval [0,1.5] is 0.43. Since the area under ¢(x) for x < 0 is 1/2, the area for 
x < 1.5 is 0.43 + 0.5 = 0.93. Since these are the “underweight” oranges, the farmer can 
expect about 7% of his crop to be at least 9 ounces. J 


Example 27 (Approximating the binomial distribution) Recall the binomial dis- 
tribution from Example 24: b(k;n,p) is the probability of exactly k heads in n tosses and 
p is the probability of a head on one toss. We derived the formula b(k;n,p) = (*)p*q"*, 
where g = 1 —p. We also found, that for a binomial random variable X, E(X) = np and 
Var(X) = npg. How does the random variable behave when n is large? We already saw in 
Example 25 how to use the Poisson approximation when E(X) is not large. When E(X) 
and Var(X) are large, a better approximation is given by the normal distribution ,,, with 


f= np and o = ,/npq. 


Suppose that our book in Example 25 is a lot worse: About one word in ten is wrong. 
How can we estimate the probability of at most 30 errors on page 95? If the errors are 
independent, the distribution is a binomial with p = 0.1 and n equal to the number of words 
on page 95. We estimate that n is about 400. Thus we are dealing with b(k;400,0.1). We 


have 
w= 400x0.1=40 and o=V400 x 0.1 x 0.9 = V36=6. 


Thus we want the area under (x) for x < h(30) = (30 — 40)/6 = —1.5. By the symmetry 
of ¢, this is the area under ¢(«) for x > 1.5, which is 0.5 — 0.43 = 7%. 
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We’ve done some rounding off here, which is okay since our estimates are rather crude. 
There are ways to improve the estimates, but we will not discuss them. J 


Approximations like those in the preceding example are referred to as “limit theorems” 
in probability theory. The next example discusses the use of an important limit theorem, 
the Central Limit Theorem, for estimating how close an average of measurements is to 
the true value of a number. This is often used in experimental science when estimating a 
physical constant. 


*Example 28 (The Central Limit Theorem and the normal distribution) Suppose 
a student must estimate a quantity, say the distance between two buildings on campus. The 
student makes a number n of measurements. Each measurement can be thought of as a 
sample of a random variable. Call the random variable for measurement 7_X,. If the student 
is not influenced by the previous measurements, we can think of the random variables as 
being independent and identically distributed. The obvious thing to do is average these 
measurements. How accurate is the result? 


Let’s phrase this in probabilistic terms. We have a new random variable given by 
X = (X,4+---+X,,)/n and our average is a sample of the value of the random variable X. 
What can we say about X? 


We can approximate X with a normal distribution. This approximation is a conse- 
quence of the Central Limit Theorem. Let A, be the average of the n measurements 
and let Ag the average of the squares of the n measurements. Then we estimate and o 
by A; and \/(A2 — (A1)?)/(n — 1), respectively.'? We could now use ¢,,¢ to estimate the 
distribution of the random variable X. 


This can be turned around, ¢,,, can also be used to estimate the true mean of the 
random variable X. You might have thought that A; was the mean. No. It is just 
the average of some observed values. Thus, the probability that the mean of X lies in 
[U —0,u+0] equals 0.34 + 0.34 = 0.68. O 


We've looked at several different distributions: binomial, normal, Poisson and marginal. 
What do we use when? How are they related? 


The binomial distribution occurs when you have a sequence of repeated independent 
events and want to know how many times a certain event occurred. For example, the 
probability of k heads in n tosses of a coin. The coin tosses are the repeated independent 
events and the heads are the events we are interested in. 


The normal distribution is usually an approximation for estimating a number whose 
value is the sum of a lot of (nearly) independent random variables. For example, let X; be 
1 or O according as the i-th coin toss is a head or tail. We want to know the probability 
that X, + Xo +...+ Xp, equals k. The exact answer is the binomial distribution. The 
normal distribution gives an approximation. 


The Poisson distribution is associated with rare events. For example, if light bulbs fail 
at random (we’re not being precise here) and have an average lifetime L, then the number 


11 The estimate for o is a result from statistics. We cannot derive it here. 
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of failures in a time interval T is roughly Poisson if \ = T/L is not too big or too small. 
Another example is errors in a text, which are rare and have a distribution associated with 
them that is like the binomial. 


Unlike the previous three distributions, which exist by themselves, a marginal distri- 
bution is always derived from some given distribution. In our coin toss experiment, let X 
be the number of heads and let Y be the number of times two or more tails occur together. 
We could ask for the distribution given by P(X = k and Y = 7). This is called a “joint dis- 
tribution” for the random variables X and Y. Given the joint distribution, we could ask for 
the distribution of just one of the random variables. These are “marginal distributions” as- 
sociated with the joint distribution. In this example, P(X = k) and P(Y = 7) are marginal 
distributions. The first one (the probability of k heads) is the sum of P(X =k and Y = 7) 
over all j and the second (the probability of two or more tails together happening j times) 
is the sum of P(X =k and Y = 7) over all k. 


Exercises for Section 4 


4.1. A fair coin is tossed four times, recording H if heads, T if tails. Let X be the 
random variable defined by X(titgt3t4) = |{i | t; = H}|. Let Y be the random 
variable defined by 


0, ift; =T7 for alli =1, 2, 3, 4; 
Y (titatsta) = on |H =t; =ty1 =-++ = tige-1,1 = 1,2,3,4}, otherwise. 


The random variable X equals the number of H’s. The random variable Y equals 
the length of the longest consecutive string of H’s. Compute 


(a) the joint distribution function hx y, 
(b) the marginal distributions fx and fy, 
(c) the covariance Cov(X,Y), and 

(d) the correlation p(X,Y). 


Give an intuitive explanation of the value of p(X,Y). 


4.2. Let X and Y be random variables on a sample space U and let a and b be real 
numbers. 


(a) Show that Cov(aX + bY,aX — bY) is a?Var(X) — b?Var(Y). 
(b) What is Var((aX — bY)(aX + bY))? 


4.3. Let X be random variable on a sample space U and let a and b be real numbers. 
What is E((aX + b)?) if 


(a) X has the binomial distribution b(k;n, p)? 
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4.4. 


4.5. 


4.6. 


4.7. 


(b) X has the Poisson distribution e~** /k!? 


A 100 page book has 200 misprints. If the misprints are distributed uniformly 
throughout the book, show how to use the Poisson approximation to the binomial 
distribution to calculate the probability of there being less than 4 misprints on 
page 8. 


Let X and Y be independent random variables and let a and b be real numbers. 
Let Z = aX + bY. Then, for all « > 0, Tchebycheff’s inequality gives an upper 
bound for P(|Z — E(Z)| > €). Give this upper bound for the cases where 


(a) X and Y have Poisson distribution p(k; y) and p(k; 6) respectively. 
(b) X and Y have binomial distribution p(k;n,r) and p(k;n, 8) respectively. 


Each time a customer checks out at Super Save Groceries, a wheel with nine white 
and one black dot, symmetrically placed around the wheel, is spun. If the black 
dot is uppermost, the customer gets the least expensive item in their grocery cart 
for free. Assuming the probability of any dot being uppermost is 1/10, what is the 
probability that out of the first 1000 customers, between 85 and 115 customers get 
a free item? Write the formula for the exact solution and show how the normal 
distribution can be used to approximate this solution. You need not compute the 
values of the normal distribution. 


Let X1,...,X, be independent random variables each having mean p and variance 
a”. (These could arise by having one person repeat n times an experiment that 
produces an estimate of a number whose value is yw. See Example 28.) Let X = 


(Xy+---+X,)/n. 
(a) Compute the mean and variance of X. 
(b) Explain why an observed value of X could be used as an estimate of ju. 


(c) It turns out that the error we can expect in approximating jz with X is pro- 
portional to the value of ax. Suppose we want to reduce this expected error 
by a factor of 10. How much would we have to increase n. (In other words, 
how many more measurements would be needed.) 
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1. In each case some information is given about a function. In which case is the informa- 
tion not sufficient to define a function? 


(a) feat, 253, 154, 352. 
(b) fe{>,<,+,7}3, f=(2,<,+). 
( fesenr, F= (312.5). 
( 


) 

c) 

d) feat <+7} ff = (3,1,2,3). Domain ordered as follows: >, <, +, ?. 
) 


(e) f Ee {>, 2 f = (?,<,+). Domain ordered as follows: 3,2, 1. 


12.3 4.5 6 7 8 9 
9372646518 
of the following is a correct cycle form for this function? 


y (1,8, 9) (2, 3, 7,5; 6,4) 
) (1,9, 8)(2, 3, 5, 7, 6, 4) 
c) (1,9, 8)(2, 3, 7, 5, 4, 6) 
) (1,9, 8)(@, 3, 7,5, 6,4). 
(e) (1,9, 8)(3, 2, 7, 5,6, 4) 


2. The following function is in two line form: i Which 


(a 
(b 
( 
(d 


3. In each case some information about a function is given to you. Based on this infor- 
mation, which function is an injection? 


(a) f eG, Coimage(f) = {{1}, {2}, {3}, {4}, {53} 
(b) fe6*, Coimage(f) = {{1}, {2}, {3}, {4}, (5, 6} 
(c) fe5*,  f-*(2) = {13,5}, f-*(4) = {2,4}} 
(d) fe4?,  |Image(f)| = 4 
(ec) fe 5°, Coimage(f) = {{1,3, 5}, {2,4}} 
4. The following function is in two line form: f = ; : : ; ; : : : ar 
Which of the following is a correct cycle form for h = f%o f~'? 


(a) (1,6,8)(2,3,7)(5, 6 7 
(b) (1,6, si 4,5)(3, 7,9) 
1,8,6)(2, 3, 7)(5,9, 4) 
1,9,8)(2, 3, 5)(7, 6,4) 
8,5, 9,2, 4, 1,3, 6,7) 


(c 
(d 


) 
) 
) 
(e) ( 


135 


5. The following permutation is in two line form: f = : ‘ : , ; ; f8 a 
og be the composition of f 


The permutation g = (1,2,3) is in cycle form. Let h = f 
and g. Which of the following is a correct cycle form for h? 
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10. 


11. 


12. 


(a) (1,6, 9,5, 2,4, 7)(3,8) 
(b) (1,8, 3, 4, 7, 2, 6)(5, 9) 
(c) (1, 8,3, 7,4, 2, 6)(9, 5) 
(d) (1,8, 4,3, 7, 2, 6)(9, 5) 
(e) (8,6, 4, 7,9, 1, 2)(3, 5) 


. We want to find the smallest integer n > 0 such that, for every permutation f on 4, 


the function f” is the identity function on 4. What is the value of n? 


(a) 4 (b) 6 (¢}). 12 (d) 24 (e) It is impossible. 


. In the lexicographic list of all strictly decreasing functions in 92, find the successor of 


98432. 
(a) 98431  (b) 98435 ~— (c) 98521 ~— (d) 98532 ~—(e) 98543 


. The 16 consecutive points 0,1,...,14,15 have 0 and 15 converted to exterior box 


boundaries. The interior box boundaries correspond to points 1,5,7,9. This configu- 
ration corresponds to 


(a) 9 balls into 5 boxes 
b) 9 balls into 6 boxes 
c) 10 balls into 5 boxes 
d) 10 balls into 6 boxes 
e) 11 balls into 4 boxes 


( 
( 
( 
( 


. The 16 consecutive points 0,1,...,14,15 have 0 and 15 converted to exterior box 


boundaries. The interior box boundaries correspond to the strictly increasing functions 
1<241 <2%q < x43 < x44 < 14 in lex order. How many configurations of balls into boxes 
come before the configuration e||||eeeeeeeee? (Exterior box boundaries are not 
shown.) 


(a) (3) (b) (7) (c) (3) (d) G) (e) ee 
Suppose f € 72. How many such functions have |Image(f)| = 4? 
(a) S(7,4) — (b) S(7,4)(6)a (ec) S(6,4)(7)a_ ss (d) S(4,7)(6)a ss (e) S(7, 4) 6! 


Let X be a random variable with distribution b(k;n,p), q=1—p. Let Y = (X +1)?. 
Then E(Y) = 


(a) npg + (np +1)? 
(b) 2npg + (np +1)? 
(c) npg + 2(np + 1)? 
(d) (npq)? + (np + 1)? 
(ec) 2npq(np + 1)? 


Let X and Y be independent random variables with distribution b(k; n, a) and b(k; n, b) 
respectively. Let Z = X +2Y. Then, for all « > 0, Tchebycheff’s inequality guarantees 
that P(|Z — na — 2nb| > €) is always less than or equal to what? 
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14. 


15. 


16. 


17. 


18. 


Review Questions 


(a) (na(1 — a) + nb(1 — b))/e? 

(b) (na(1 — a) + 2nb(1 — b)) /e? 
(c) (na(1 — a) + 4nb(1 — b)) /e? 
(d) (na(1 — a) + 2nb(1 — b))/e8 
(e) (na(1 — a) + 4nb(1 — b))/e3 


An 800 page book has 400 misprints. If the misprints are distributed uniformly 
throughout the book, and the Poisson approximation to the binomial distribution 
is used to calculate the probability of exactly 2 misprints on page 16, which of the 
following represents the correct use of the Poisson approximation? 


(a) e®°/8 (b) e~°:°/8 (c) e°:°/16 (d) e~°-> /16 (e)e- 9 8130 


For 40 weeks, once per hour during the 40 hour work week, an employee of Best Cars 
draws a ball from an urn that contains 1 black and 9 white balls. If black is drawn, a 
$10 bill is tacked to a bulletin board. At the end of the 40 weeks, the money is given 
to charity. What is the expected amount of money given? 


(a) 1000 (b) 1200 =(c) 1400 ~=— (ad) 1600 ~—(e) 1800 


For 40 weeks, once per hour during the 40 hour work week, an employee of Best Cars 
draws a ball from an urn that contains 1 black and 9 white balls. If black is drawn, 
$10 is tacked to a bulletin board. At the end of the 40 weeks, the money is given to 
charity. Using the normal approximation, what interval under the standard normal 
curve should be used to get the area which equals the probability that $1800 or more 
is given? 


(a) from 1.67 to co 
(b) from 0 to 1.67 
(c) from 0.6 to oo 
(d) from 0 to 0.6 
(e) from 0.6 to 1.67 


A fair coin is tossed three times. Let X be the random variable which is one if the 
first throw is T (for tails) and the third throw is H (for heads), zero otherwise. Let 
Y denote the random variable that is one if the second and third throws are both H, 
zero otherwise. The covariance, Cov(X,Y) is 


(a) 1/8  (b)-1/8 (c)1/16 (d)—1/16 —(e) 1/32 


A fair coin is tossed three times. Let X be the random variable which is one if the 
first throw is T (for tails) and the third throw is H (for heads), zero otherwise. Let 
Y denote the random variable that is one if the second and third throws are both H, 
zero otherwise. The correlation, p( X,Y) is 


(a) 0 (b) 1/8 (ce) -1/8— (d) 1/8 (e) 1/8 


A fair coin is tossed three times and a T (for tails) or H (for heads) is recorded, giving 
us a 3-long list. Let X be the random variable which is zero if no T has another T 
adjacent to it, and is one otherwise. Let Y denote the random variable that counts 
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21. 


22. 


the number of T’s in the three tosses. Let hx y denote the joint distribution of X and 
Y. hx y(1,2) equals 


(a) 5/8 (b) 4/8 (c) 3/8 (d) 2/8 (e) 1/8 
Which of the following is equal to Cov(X + Y,X — Y), where X and Y are random 
variables on a sample space S? 


(a) Var(X) — Var(Y) 

(b) Var(X2) — Var(Y?) 

(c) Var(X2) + 2Cov(X, Y) + Var(Y?) 
(d) Var(X2) — 2Cov(X,Y) + Var(Y7) 
(e) (Var(X))? — (Var(Y))? 


Which of the following is equal to Var(2X — 3Y), where X and Y are random variables 
on S? 


X) + 12Cov(X,Y) + 9Var(Y) 
) — 3Var(Y) 

X) + 6Cov(X, Y) + 3Var(Y) 
(d) 4Var(X) — 12Cov(X, Y) + 9Var(Y) 
(e) 2Var(X) — 6Cov(X, Y) + 3Var(Y) 


The strictly decreasing functions in 1002 are listed in lex order. How many are there 
before the function (9,5,4)? 


(a) 18 (b)23. (c)65  (d)98 ~~ (e) 180 


All but one of the following have the same answer. Which one is different? 


a) The number of multisets of size 20 whose elements lie in 5. 


b) The number of strictly increasing functions from 20 to 24. 


( 
( 
(c) The number of subsets of size 20 whose elements lie in 24. 
(d) The number of weakly decreasing 4-lists made from 21. 

( 


e) The number of strictly decreasing functions from 5 to 24. 


23. Let X be a random variable with Poisson distribution p(k; A) Let Y = (X +2)(X +1). 
What is the value of E(Y)? 
(a) A°+3A41 
(b) A* + 3A +42 
(c) A* +44 2 
(d) 3\7 +3 +2 
(e) 442 + 442 
Answers: 1 (d), 2 (e), 3(c), 4 (a), 5 (a), 6(e), 7 (e), 8 (c), 9 (a), 10 (d), 11 (b), 
12 (c), 13 (e), 14 (b), 15 (a), 16 (b), 17 (b), 18 (a), 19 (b), 20 (b), 21 (b), 22 (d), 
3 (ce), 24 (d), 25 (d) 1 (c), 2 (d), 3 (a), 4 (b), 5 (a), 6 (c), 7 (c), 8 (c), 9 (a), 
O(c), 11 (a), 12 (c), 13 (b), 14 (d), 15 (a), 16 (c), 17 (b), 18 (d), 19 (a), 20 (d), 
1 (c), 22 (e), 23 (c). 
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Unit DT 


Decision Trees and Recursion 


In many situations one needs to make a series of decisions. This leads naturally to 
a structure called a “decision tree.” Decision trees provide a geometrical framework for 
organizing the decisions. The important aspect is the decisions that are made. Everything 
we do in this unit could be rewritten to avoid the use of trees; however, trees 


e give us a powerful intuitive basis for viewing the problems of this chapter, 
e provide a language for discussing the material, 
e allow us to view the collection of all decisions in an organized manner. 


We'll begin with elementary examples of decision trees. We then show how decision trees 
can be used to study recursive algorithms. Next we shall look at decision trees and 
“Bayesian methods” in probability theory. Finally we relate decision trees to induction 
and recursive equations. 


Section 1: Basic Concepts of Decision Trees 


One area of application for decision trees is systematically listing a variety of functions. 
The simplest general class of functions to list is the entire set n£. We can create a typical 
element in the list by choosing an element of n and writing it down, choosing another 
element (possibly the same as before) of n and writing it down next, and so on until we 
have made k decisions. This generates a function in one line form sequentially: First f(1) 
is chosen, then f(2) is chosen and so on. We can represent all possible decisions pictorially 
by writing down the decisions made so far and then some downward “edges” indicating the 
possible choices for the next decision. 


We begin this section by discussing the picture of a decision tree, illustrating this with 
a variety of examples. Then we study how a tree is traversed, which is a way computers 
deal with the trees. 


Decision Trees and Recursion 


What is a Decision Tree? 


Example 1 (Decision tree for 2°) Here is an example of a decision tree for the func- 
tions 22. We’ve omitted the commas; for example, 121 stands for the function 1,2,1 in 


one-line form. 
R 


111.6112 121 122 211 212 221 222 


The set 
VY =4{R, 1,2, 11,12, 21, 22,111, 112, 121, 122, 211, 212,221, 222} 


is called the set of vertices of the decision tree. The vertex set for a decision tree can be 
any set, but must be specified in describing the tree. You can see from the picture of the 
decision tree that the places where the straight line segments (called edges) of the tree end 
is where the vertices appear in the picture. Each vertex should appear exactly once in the 
picture. The symbol R stands for the root of the decision tree. Various choices other than 
R can be used as the symbol for the root. 


The edges of a decision tree such as this one are specified by giving the pair of vertices 
at the two ends of the edge, top vertex first, as follows: (R,1), (21,212), etc. The vertices at 
the ends of an edge are said to be “incident” on that edge. The complete set of edges of this 
decision tree is the set E = { (R,1), (R,2), (1,11), (1,12), (2,21),(2,22), (11,111), (11,112), 
(12,121), (12,122), (21,211), (21,212), (22,221), (22,222)}. In addition to the edges, there 
are “labels,” either a “1” or a “2,” shown on the line segments representing the edges in the 
picture. This labeling of the edges can be thought of as a function from the set of edges, 
E, to the set {1,2}. 


If e = (v,w) is an edge, the vertex w, is called a child of v, and v is the parent of w. 
The children of 22 are 221 and 222. The parent of 22 is 2. 


The degree of a vertex v is the number of edges incident on that vertex. The 
down degree of v is the number of edges e = (v, w) incident on that vertex (and below it); 
in other words, it is the number of children of v. The degree of 22 is 3 (counting edges 
(2, 22), (22, 221), (22, 222)). The down degree of 22 is 2 (counting edges (22, 221), (22, 222)). 
Vertices with down degree 0 are called leaves. All other vertices are called internal 
vertices. 
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For any vertex v in a decision tree there is a unique list of edges (#1, x2), (%2,23),..-, 
(tp, %~41) from the root 2; to v = %g41. This sequence is called the path to a vertex in 
the decision tree. The number of edges, k, is the length of this path and is called the 
height or distance of vertex v to from the root. The path to 22 is (R, 2), (2,22). The height 
of 22is2. 0 


The decision tree of the previous example illustrates the various ways of generating 
a function in 22 sequentially. It’s called a decision tree for generating the functions in 
22. Each edge in the decision tree is labeled with the choice of function value to which 
it corresponds. Note that the labeling does not completely describe the corresponding 
decision — we should have used something like “Choose 1 for the value of f(2)” instead of 
simply “1” on the line from 1 to 11. 


In this terminology, a vertex v represents the partial function constructed so far, when 
the vertex v has been reached by starting at the root, following the edges to v, and making 
the decisions that label the edges on that unique path from the root to v. The edges 
leading out of a vertex are labeled with all possible decisions that can be made next, given 
the partial function at the vertex. We labeled the edges so that the labels on edges out of 
each vertex are in order, 1,2, when read left to right. The leaves are the finished functions. 
Notice that the leaves are in lexicographic order. In general, if we agree to label the edges 
from each vertex in order, then any set of functions generated sequentially by specifying 
f(i) at the ith step will be in lex order. 


To create a single function we start at the root and choose downward edges (i.e., make 
decisions) until we reach a leaf. This creates a path from the root to a leaf. We may 
describe a path in any of the following ways: 


e the sequence of vertices v9, V1,...,Um on the path from the root vo to the leaf vj; 
e the sequence of edges €1,€2,...,@m, where e; = (4-1, 0;), i= 1,...,m; 
e the sequence of decisions D,, D,,..., Dm, where e; is labeled with decision D;. 


We illustrate with three descriptions of the path from the root R to the leaf 212 in Exam- 
ple 1: 


e the vertex sequence is R, 2, 21, 212; 
e the edge sequence is (R, 2), (2,21), (21, 212); 
e the decision sequence is 2, 1, 2. 


Decision trees are a part of a more general subject in discrete mathematics called 
“oraph theory,” which is studied in another unit. 


It is now time to look at some more challenging examples so that we can put decision 
trees to work for us. The next example involves counting words where the decisions are 
based on patterns of consonants and vowels. 
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Example 2 (Counting words) Using the 26 letters of the alphabet and considering the 
letters AEIOUY to be vowels how many five letter “words” (i.e. five long lists of letters) 
are there, subject to the following rules? 


(a) No vowels are ever adjacent. 
(b) There are never three consonants adjacent. 
(c) Adjacent consonants are always different. 


To start with, it would be useful to have a list of all the possible patterns of consonants 
and vowels; e.g., CCVCV (with C for consonant and V for vowel) is possible but CVVCV 
and CCCVC violate conditions (a) and (b) respectively and so are not possible. We’ll use 
a decision tree to generate these patterns in lex order. Of course, a pattern CVCCV can 
be thought of as a function f where f(1) =C, f(2)=V,..., f(5) =V. 


We could simply try to list the patterns (functions) directly without using a deci- 
sion tree. The decision tree approach is preferable because we are less likely to overlook 
something. The resulting tree can be pictured as follows: 


ig I 
Cc V 
CC CV VC 
CCV CVC VCC VCV 
| | es 
CCVC CVCC CVCV VCCV VCVG 
CCVCC CCVCV CVCCV CVCVC VCCVC VCVCC VCVCV 


At each vertex there are potentially two choices, but at some vertices only one is possible 
because of rules (a) and (b) above. Thus there are one or two decisions at each vertex. You 
should verify that this tree lists all possibilities systematically. 


We have used the dash “—” as the symbol for the root. This stands for the empty word 
on the letters C and V. The set of labels for the vertices of this decision tree T’ is a set 
of words of length 0 through 5. The vertex set is determined by the rules (or “syntax” ) 
associated with the problem (rules (a), (b), and (c) above). 


Using the rules of construction, (a), (b), and (c), we can now easily count the number 
of words associated with each leaf. The total number of words is the sum of these individual 
counts. For CCVCC we obtain (20 x 19)? x 6; for CCVCV, CVCCV, VCCVC, and VCVCC 
we obtain (20 x 19) x 20 x 6?; for CVCVC we obtain 20° x 67; for VCVCV we obtain 
27 xe. O 


Definition 1 (Rank of an element of a list) The rank of an element in a list is 
the number of elements that appear before it in the list. The rank of a leaf of a decision 


tree is the number of leaves that are to the left of it in the picture of the tree. The rank is 
denoted by the function RANK. 


In Example 1, RANK(212) = 5 and RANK(111) = 0. 
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Rank can be used to store data in a computer. For example, suppose we want to store 
information for each of the 10! + 3.6 x 10° permutations of 10. The naive way to do this 
is to have a 10 x 10 x --- x 10 array and store information about the permutation f in 
location f(1),..., f(10). This requires storage of size 101°. If we store information about 
permutations in a one dimensional array with information about f stored at RANK(f), 
we only need 10! storage locations, which is much less. We’ll discuss ranking permutations 
soon. 


The inverse of the rank function is also useful. Suppose we want to generate objects 
at random from a set of n objects. Let RANK be a rank function for them. Generate a 
number k between 0 and n—1 inclusive at random. Then RANK~'(k) is a random object. 


Example 3 (Permutations in lexicographic order) Recall that we can think of a 
permutation on 3 as a bijection f : 3 — 3. Its one-line form is f(1), f(2), f(3). Here is an 
example of a decision trees for this situation (omitting commas): 


1 2 3 
1 2 
12 13 21 23 31 32 


3 | 9 1] 2 | 


123 132 213 231 312 321 


Because we first chose f(1), listing its values in increasing order, then did the same with 
f(2) and finally with f(3), the leaves are listed lexicographically, that is, in “alphabetical” 
order like a dictionary only with numbers instead of letters. 


We could have abbreviated this decision tree a bit by shrinking the edges coming from 
vertices with only one decision and omitting labels on nonleaf vertices. As you can see, 
there is no “correct” way to label a decision tree. The intermediate labels are simply a 
tool to help you correctly list the desired objects (functions in this case) at the leaves. 
Sometimes one may even omit the function at the leaf and simply read it off the tree by 
looking at the labels on the edges or vertices associated with the decisions that lead from 
the root to the leaf. In this tree, the labels on an edge going down from vertex v tell us what 
values to add to the end of v to get a “partial permutation” that is one longer than v. J 
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Permutations 


We now look at two ways of ranking permutations. The method for generating random 
permutations in Example 23 of Unit Fn can provide another method for ranking permu- 
tations. If you’d like a challenge, you can think about how to do that. We won’t discuss 
it. 


Example 4 (Permutations in direct insertion order) Another way to create a 
permutation is by direct insertion. (Often this is simply called “insertion.” ) Suppose that 
we have an ordered list of & items into which we want to insert a new item. It can be placed 
in any of k+1 places; namely, at the end of the list or immediately before the ith item 
where 1 <i<k. By starting out with 1, choosing one of the two places to insert 2 in this 
list, choosing one of the three places to insert 3 in the new list and, finally, choosing one of 
the four places to insert 4 in the newest list, we will have produced a permutation of 4. To 
do this, we need to have some convention as to how the places for insertion are numbered 
when the list is written from left to right. The obvious choice is from left to right; however, 
right to left is often preferred. We’ll use right to left. One reason for this choice is that the 
leftmost leaf is 12...n as it is for lex order. 


If there are k + 1 possible positions to insert something, we number the positions 
0,1,...,&, starting with the rightmost position as number 0. We’ll use the notation (); to 
stand for position 7 so that we can keep track of the positions when we write a list into 
which something is to be inserted. 

Here’s the derivation of the permutation of 4 associated with the insertions 1, 1 and 2. 


e Start with ()11()o. 


e Choose the insertion of the symbol 2 into position 1 (designated by ()1) to get 
21. With positions of possible insertions indicated, this is ( )22()11()o. 


e Now insert symbol 3 into position 1 (designated by ()1) to get 231 or, with possible 
insertions indicated ()32()23()11()o. 


e Finally, insert symbol 4 into position 2 to get 24381. 


Here is the decision tree for permutations of 4 in direct insertion order. We’ve turned 
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the vertices sideways so that rightmost becomes topmost in the insertion positions. 


12 
21 


ap) N N ise] ~~ - 
N oO cae _ oO N 
- - oO N N ise) 


TFTFOmnnono yr nnnwryFnantantyt ODMH YO Fre rr Fr rT 
MOFTNnNnNytdOdonNwtytTr Tr OnwtTrri re yy nA r tT AN 
NNT TH YO YN SFr rr TONrr YT NUMA DA SAAN Tt OO 
rrr Fre Tr Tr i yFOUIOYDNM|MWSFNNANNTANN TST OOO TF 


The labels on the vertices are, of course, the partial permutations, with the full permutations 
appearing on the leaves. The decision labels on the edges are the positions in which to insert 
the next number. Notice that the labels on the leaves are no longer in lex order because we 
constructed the permutations differently. Had we labeled the vertices with the positions 
used for insertion, the leaves would then be labeled in lex order. For example, 2413 becomes 
1,0,2, which is gotten by reading edge labels on the path from the root to the leaf labeled 
2413. Similarly 4213 becomes, 1,0,3. 


Like the method of lex order generation, the method of direct insertion generation 
can be used for other things besides permutations. However, direct insertion cannot be 
applied as widely as lex order. Lex order generation works with anything that can be 
thought of as an (ordered) list, but direct insertion requires more structure. Note that 
the RANK(3412) = 10 and RANK(4321) = 23. What would RANK(35412) be for the 
permutations on 5 in direct insertion order? J 


Traversing Decision Trees 


We conclude this section with an important class of search algorithms called backtrack- 
ing algorithms. In many computer algorithms it is necessary either to systematically 
inspect all the vertices of a decision tree or to find the leaves of the tree. An algorithm that 
systematically inspects all the vertices (and so also finds the leaves) is called a traversal of 
the tree. How can we create such an algorithm? To understand how to do this, we first 
look at how a tree can be “traversed.” 
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Example 5 (Traversals of a decision tree) Here is a sample decision tree T’ with edges 
labeled A through J and vertices labeled 1 through 11. 


The arrows are not part of the decision tree, but will be helpful to us in describing certain 
ideas about linear orderings of vertices and edges that are commonly associated with deci- 
sion trees. Imagine going around (“traversing”) the decision tree following arrows. Start at 
the root, 1, go down edge A to vertex 2, etc. Here is the sequence of vertices as encountered 
in this process: 1, 2, 4, 2, 5, 2, 1, 3, 6, 8, 6, 9, 6, 10, 6, 11, 6, 3, 7, 3, 1. This sequence 
of vertices is called the depth first vertex sequence, DF V(T), of the decision tree T. The 
number of times each vertex appears in DF V(T) is one plus the down degree of that vertex. 
For edges, the corresponding sequence is A, C, C, D, D, A, B, E, G, G, H, H, I, I, J, J, E, 
F,F, B. This sequence is the depth first edge sequence, DFE(T), of the tree. Every edge 
appears exactly twice in DFE(T). If the vertices of the tree are read left to right, top to 
bottom, we obtain the sequence 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11. This is called the breadth 
first vertex sequence, BFV(T). Similarly, the breadth first edge sequence, BFE(T), is A, 
B, C, D, E, F, G, H, I, J. 


The sequences BF V(T) and BFE(T) are linear orderings of the vertices and edges of 
the tree T (i.e., each vertex or edge appears exactly once in the sequence). We also associate 
two linear orderings with DF V(T): 


e PREV(T), called the preorder sequence of vertices of T, is the sequence of first occur- 
rences of the vertices of T in DF V(T). 


e POSV(T), called the postorder sequence of vertices of T, is the sequence of last occur- 
rences of the vertices of T in DF V(T). 


For the present tree 
PREV(T) = 1,2,4,5;3,6,8,9,.10,11,7 and POSV(T) =4,5,2,8,9, 10, 11,6, 7,3, 1. 


With a little practice, you can quickly construct PREV(T) and POSV(T) directly from 
the picture of the tree. For PREV(T), follow the arrows and list each vertex the first time 
you encounter it (and only then). For POSV(T), follow the arrows and list each vertex the 
last time you encounter it (and only then). Notice that the order in which the leaves of T 
appear, 4, 5, 8, 9, 10, 11, is the same in both PREV(T) and POSV(T). O 
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We now return to the problem of creating a traversal algorithm. One way to imagine 
doing this is to generate the depth first sequence of vertices and/or edges of the tree as 
done in the preceding example. We can describe our traversal more precisely by giving an 
algorithm. Here is one which traverses a tree whose leaves are associated with functions 
and lists the functions in the order of PREV(T). 


Theorem 1 (Systematic Traversal Algorithm) The following procedure systemati- 
cally visits the vertices of a tree T in depth-first order, DF V(T), listing the leaves as they 
occur in the list DF V(T). 


1. Start: Mark all edges as unused and position yourself at the root. 
2. Leaf: If you are at a leaf, list the function. 


3. Decide case: If there are no unused edges leading out from the vertex, go to 
Step 4; otherwise, go to Step 5. 


4. Backtrack: If you are at the root, STOP; otherwise, return to the vertex just 
above this one and go to Step 3. 


5. Decision: Select the leftmost unused edge out of this vertex, mark it used, follow 
it to a new vertex and go to Step 2. 


Step 4 is labeled Backtrack. What does this mean? If you follow the arrows in the tree 
pictured in Example 5, backtracking corresponds to going toward the root on an edge that 
has already been traversed in the opposite direction. In other words, backtracking refers 
to the process of moving along an edge back toward the root of the tree. Thinking in 
terms of the decision sequence, backtracking corresponds to undoing (i.e., backtracking on) 
a decision previously made. Notice that the algorithm only needs to keep track of the 
decisions from the root to the present vertex — when it backtracks, it can “forget” the 
decision it is backtracking from. You should take the time now to apply the algorithm to 
Example 1, noting which decisions you need to remember at each time. 


So far in our use of decision trees, it has always been clear what decisions are reasonable; 
i.e., are on a path to a solution. (In this case every leaf is a solution.) This is because we’ve 
looked only at simple problems such as listing all permutations of n or listing all functions 
in n®. We now look at a situation where that is not the case. 


Example 6 (Restricted permutations) Consider the following problem. 


List all permutations f of n such that 
lf) -—fG@+)|<3 for l<i<n. 


It’s not at all obvious what decisions are reasonable in this case. For instance, when n = 9, 
the partially specified one line function 124586 cannot be completed to a permutation. 


There is a simple cure for this problem: We will allow ourselves to make decisions 
which lead to “dead ends,” situations where we cannot continue on to a solution. With this 
expanded notion of a decision tree, there are often many possible decision trees that appear 
reasonable for doing something. Suppose that we’re generating things in lex order and we’ve 
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reached the vertex 12458. What do we do now? We’ll simply continue to generate more of 
the permutation, making sure that |f(i) — f(i+1)| < 3 is satisfied for that portion of the 
permutation we have generated so far. The resulting portion of the tree that starts with 
the partial permutation 12458 is represented by the following decision tree. Because the 
names the of the vertices are so long, we’ve omitted all of the name, except for the function 
value just added, for all but the root. Thus the rightmost circled vertex is 124589763. 


12458 
rN 
3 7 9g 6 9 6 7 
{7, 9} a | rae 
9. F 38 9 6 3 7 6 
{3} {3} {9} {3} | 


Each vertex is labeled with an additional symbol of the permutation after the symbols 
12458. The circled leaves represent solutions. Each solution is obtained by starting with 
12458 and adding all symbols on vertices of the path from the root, 12458, to the circled 
leaf. Thus the two solutions are 124587963 and 124589763. The leaves that are not circled 
are places where there is no way to extend the partial permutation corresponding to that 
leaf such that |f(i) — f(i+1)| < 3 is satisfied. Below each leaf that is not a solution, you 
can see the set of values available for completing the partial permutation at that leaf. It is 
clear in those cases that no completion satisfying |f(i) — f(i+ 1)| < 3 is possible. 


Had we been smarter, we might have come up with a simple test that would have told 
us that 124586 could not be extended to a solution. This would have lead to a different 
decision tree in which the vertex corresponding to the partial permutation 124586 would 
have been a leaf. 


You should note here that the numbers 3, 6, 7, 9 are labels on the vertices of this tree. 
A vertex of this tree is the partial permutation gotten by concatenating (attaching) the 
labels on the path from the root to that vertex to the permutation 12458. Thus, 124586 is a 
vertex with label 6. The path from the root to that vertex is 12458, 124586 (corresponding 
to the edge (12458, 124586)). The vertices are not explicitly shown, but can be figured out 
from the rules just mentioned. The labels on the vertices correspond to the decisions to be 
made (how to extend the partial permutation created thus far). 


Our tree traversal algorithm, Theorem 1, requires a slight modification to cover this 
type of extended decision tree concept where a leaf need not be a solution: Change Step 2 
to 


2’. Leaf: If you are at a leaf, take appropriate action. 


For the tree rooted at 12458 in this example, seven of the leaves are not solutions and two 
are. For the two that are, the “appropriate action” is to print them out, jump and shout, 
or, in some way, proclaim success. For the leaves that are not solutions, the appropriate 
action is to note failure in some way (or just remain silent) and Backtrack (Step (4) 
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of Theorem 1). This backtracking on leaves that are not solutions is where this type of 
algorithm gets its name: “backtracking algorithm.” 


How can there be more than one decision tree for generating solutions in a specified 
order? Suppose someone who was not very clever wanted to generate all permutations of n 
in lex order. He might program a computer to generate all functions in n” in lex order and 
to then discard those functions which are not permutations. This leads to a much bigger 
tree because n” is much bigger than n!, even when n is as small as 3. A somewhat cleverer 
friend might suggest that he have the program check to see that f(k) # f(k —1) for each 
k > 1. This won’t slow down the program very much and will lead to only n(n — 1)"~4 
functions. Thus the program should run faster. Someone else might suggest that the 
programmer check at each step to see that the function produced so far is an injection. If 
this is done, nothing but permutations will be produced, but the program may be much 
slower. 


The lesson to be learned from the previous paragraph is that there is often a trade 
off between the size of the decision tree and the time that must be spent at each vertex 
determining what decisions to allow. Because of this, different people may develop different 
decision trees for the same problem. The differences between computer run times for 
different decision trees can be truly enormous. By carefully defining the criteria that allow 
one to decide that a vertex is a leaf, people have changed problems that were too long to 
run on a supercomputer into problems that could be easily run on a personal computer. 
We'll conclude this section with two examples of backtracking of the type just discussed. 


Example 7 (Domino coverings) We are going to consider the problem of covering a 
m by n board (for example, m = n = 8 gives a chess board) with 1 by 2 rectangles (called 
“dominoes” ). A domino can be placed either horizontally or vertically so that it covers two 
squares and does not overlap another domino. Here is a picture of the situation for m = 3, 
n = 4. (The sequences of h’s and v’s under eleven covered boards will be explained below.) 


[_] h = horizontal domino [| v = vertical domino 


AH ES) EO) Oo 


3x4board hhhhhh hhhvvh hhvhvh hhvvhh hhvvvv 


SY HY EY Er EY 


hvvhhh hyvvvh vhvhhh vvhhhh vvhvvh vyvvhh 


If the squares of the board are numbered systematically, left to right, top to bottom, 
from 1 to 12, we can describe any placement of dominoes by a sequence of 6 h’s and v’s: 
Each of the domino placements in the above picture has such a description just below it. 
Take as an example, hhvhvh (the third domino covering in the picture). We begin with no 
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dominoes on the board. None of the squares, numbered 1 to 12 are covered. The list of 
“unoccupied squares” is as follows: 


oo 7o& 


2 3 
6 7 
1 


O Ol 
eK 
on) 
e 
fae 
i) 


Thus, the smallest unoccupied square is 1. The first symbol in hhvhvh is the h. That 
means that we take a horizontal domino and cover the square 1 with it. That forces us to 
cover square 2 also. The list of unoccupied squares is as follows: 


3.~«Ad 
5 6 7 8 
9 10 11 12 


Now the smallest unoccupied square is 3. The second symbol in hhvhhv is also an h. Cover 
square 3 with a horizontal domino, forcing us to cover square 4 also. The list of unoccupied 
squares is as follows: 


5 6 7 8 
9 10 11 12 


At this point, the first row of the board is covered with two horizontal dominoes (check 
the picture). Now the smallest unoccupied square is 5 (the first square in the second row). 
The third symbol in hhvhvh is v. Thus we cover square 5 with a vertical domino, forcing 
us to cover square 9 also. The list of unoccupied squares is as follows: 


6 7 8 
10 11 12 


We leave it to you to continue this process to the bitter end and obtain the domino covering 
shown in the picture. 


Here is the general description of the process. Place dominoes sequentially as follows. 
If the first unused element in the sequence is h, place a horizontal domino on the first 
unoccupied square and the square to its right. If the first unused element in the sequence is 
v, place a vertical domino on the first unoccupied square and the square just below it. Not 
all sequences correspond to legal placements of dominoes (try hhhhhv). For a 2 x 2 board, 
the only legal sequences are hh and vv For a 2 x 3 board, the legal sequences are hvh, vhh 
and vvv. For a 3 x 4 board, there are eleven legal sequences as shown in the picture at the 
start of this example. 


To find these sequences in lex order we used a decision tree for generating sequences 
of h’s and v’s in lex order. Each decision is required to lead to a domino that lies entirely 
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on the board and does not overlap another domino. Here is our decision tree: 


an ~~ ea as 
a ee 
ie \ / * / \ / ‘ / * | 
/\ | 


h 
| 
h Vv Vv 


Note that in this tree, the decision (label) that led to a vertex is placed at the vertex rather 
than on the edge. The actual vertices, not explicitly labeled, are the sequences of choices 
from the root to that vertex (e.g., the vertex hvv has label v). The leaf vhvvv associated 
with the path v,h,v,v,v does not correspond to a covering. It has been abandoned (i.e., 
declared a leaf but not a solution) because there is no way to place a domino on the lower 
left square of the board, which is the first free square. Draw a picture of the board to see 
what is happening. Our criterion for deciding if a vertex is a leaf is to check if that vertex 
corresponds to a solution or to a placement that does not permit another domino to be 
placed on the board. It is not hard to come up with a criterion that produces a smaller 
decision tree. For example, vhvv leaves the lower left corner of the board isolated. That 
means that vhvv cannot be extended to a solution, even though more dominoes can be 
placed on the board. But, checking this more restrictive criterion is more time consuming. 


Exercises for Section 1 


1.1. List the nonroot vertices of the decision tree in Example 2 in PREV, POSV and 
BFV orders. 


1.2. Let RANK, denote the rank in lex order and let RANK, denote the rank in inser- 
tion order on permutations of n. Answer the following questions and give reasons 
for your answers: 


(a) For n = 3 and n = 4 which permutations 0 have RANK; (c) = RANK;(o)? 
(b) What is RANK, (2314)? RANK,(45321)? 

(c) What is RANK;(2314)? RANK,(45321)? 

(d) What permutation o of 4 has RANK; (c) = 15? 

(e) What permutation o of 4 has RANK;(c) = 15? 
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1.3. 


1.4. 


1.5. 


1.6. 


(f) What permutation o of 5 has RANK; (c) = 15? 


Draw the decision tree to list all sequences of length six of A’s and B’s that satisfy 
the following conditions: 


e There are no two adjacent A’s. 
e There are never three B’s adjacent. 


e If each leaf is thought of as a word, the leaves are in alphabetical order. 


Draw a decision tree for D(64), the strictly decreasing functions from 4 to 6. You 
should choose a decision tree so that the leaves are in lex order when read from left 
to right 


(a) What is the rank of 5431? of 6531? 
(b) What function has rank 0? rank 7? 


(c) Your decision tree should contain the decision tree for D(5). Indicate it and 
use it to list those functions in lex order. 


(d) Indicate how all of the parts of this exercise can be interpreted in terms of 
subsets of a set. 


Modify Theorem 1 to list all vertices in PREV order. Do the same for POSV order. 


The president of Hardy Hibachi Corporation decided to design a series of differ- 
ent grills for his square-topped hibachis. They were to be collectibles. He hoped 
his customers would want one of each different design (and spend big bucks to 
get them). Having studied combinatorics in college, his undergrad summer intern 
suggested that these grills be modeled after the patterns associated with domino 
arrangements on 4 x 4 boards. Their favorite grill was in the design which has 
the code vvhvvhhh. The student, looking at some old class notes, suggested seven 
other designs: vvhhvhvh, hhvvhvvh, vhvhhvvh, hvvhvhvh, hvvvvhhh, vhvhvvhh, 
hhhvvvh. These eight grills were fabricated out of sturdy steel rods, put in a box, 
and shipped to the boss. When he opened up the box, much to his disgust, he found 
that all of the grills were the same. What went wrong? How should the collection 
of different grills be designed? (This is called an isomorph rejection problem.) 


The favorite grill: vvhvvhhh = UT 
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Section 2: Recursive Algorithms 


A recursive algorithm is an algorithm that refers to itself when it is executing. As with any 
recursive situation, when an algorithm refers to itself, it must be with “simpler” parame- 
ters so that it eventually reaches one of the “simplest” cases, which is then done without 
recursion. Let’s look at a couple of examples before we try to formalize this idea. 


Example 8 (A recursive algorithm for 0-1 sequences) Suppose you are interested 
in listing all sequences of length eight, consisting of four zeroes and four ones. Suppose 
that you have a friend who does this sort of thing, but will only make such lists if the 
length of the sequence is seven or less. “Nope,” he says, “I can’t do it — the sequence is 
too long.” There is a way to trick your friend into doing it. First give him the problem of 
listing all sequences of length seven with three ones. He doesn’t mind, and gives you the 
list 1110000, 1011000, 0101100, etc. that he has made. You thank him politely, sneak off, 
and put a “1” in front of every sequence in the list he has given you to obtain 11110000, 
11011000, 10101100, etc. Now, you return to him with the problem of listing all strings of 
length seven with four ones. He returns with the list 1111000, 0110110, 0011101, etc. Now 
you thank him and sneak off and put a “OQ” in front of every sequence in the list he has 
given you to obtain 01111000, 00110110, 00011101, etc. Putting these two lists together, 
you have obtained the list you originally wanted. 


How did your friend produce these lists that he gave you? Perhaps he had a friend 
that would only do lists of length 6 or less, and he tricked this friend in the same way you 
tricked him! Perhaps the “6 or less” friend had a “5 or less friend” that he tricked, etc. If 
you are sure that your friend gave you a correct list, it doesn’t really matter how he got 


it. O 


Next we consider an example from sorting theory. We imagine we are given a set 
of objects which have a linear order described on them (perhaps, but not necessarily, 
lexicographic order of some sort). As a concrete example, we could imagine that we are 
given a set of integers S, perhaps a large number of them. They are not in order as presented 
to us, be we want to list them in order, smallest to largest. That problem of putting the 
set S in order is called sorting S. On the other hand, if we are given two ordered lists, 
like (25, 235, 2333, 4321) and (21, 222, 2378, 3421, 5432), and want to put the combined list 
in order, in this case (21, 25, 222, 235, 2333, 2378, 3421, 4321, 5432), this process is called 
merging the two lists. Our next example considers the relationship between sorting and 
merging. 


Example 9 (Sorting by recursive merging) Sorting by recursive merging, called 
merge sorting, can be described as follows. 


e The lists containing just one item are the simplest and they are already sorted. 
e Given a list of n > 1 items, choose k with 1 < k < n, sort the first k items, sort the 


last n — k items and merge the two sorted lists. 
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This algorithm builds up a way to sort an n-list out of procedures for sorting shorter lists. 
Note that we have not specified how the first k or last n — k items are to be sorted, we 
simply assume that it has been done. Of course, an obvious way to do this is to simply 
apply our merge sorting algorithm to each of these sublists. 


Let’s implement the algorithm using people rather than a computer. Imagine training 
a large number of obedient people to carry out two tasks: (a) splitting a list for other 
people to sort and (b) merging two lists. We give one person the unsorted list and tell him 
to sort it using the algorithm and return the result to us. What happens? 


e Anyone who has a list with only one item returns it unchanged to the person he received 
it from. 


e Anyone with a list having more than one item splits it and gives each piece to a person 
who has not received a list, telling each person to sort it and return the result. When 
the results have been returned, this person merges the two lists and returns the result 
to whoever gave him the list. 


If there are enough obedient people around, we'll eventually get our answer back. 


Notice that no one needs to pay any attention to what anyone else is doing to a list. 
This makes a local description possible; that is, we tell each person what to do and they do 
not need to concern themselves with what other people are doing. This can also be seen in 
the pseudocode for merge sorting a list L: 


Sort (L) 
If length is 1, return L 
Else 
Split L into two lists L1 and L2 
S1 = Sort(L1) 
$2 = Sort(L2) 
S = Merge(S1, S2) 
Return $ 
End if 
End 


The procedure is not concerned with what goes on when it calls itself recursively. This is 
very much like proof by induction. (We discuss proof by induction in the last section of this 
unit.) To see that, let’s prove that the algorithm sorts correctly. We assume that splitting 
and merging have been shown to be correct — that’s a separate problem. We induct on 
the length n of the list. The base case, n = 1 is handled correctly by the program since it 
returns the list unchanged. Now for induction. Splitting L results in shorter lists and so, 
by the induction hypothesis, $1 and $2 are sorted. Since merging is done correctly, S is 
also sorted. 


This algorithm is another case of divide and conquer since it splits the sorting problem 
into two smaller sorting problems whose answers are combined (merged) to obtain the 
solution to the original sorting problem. J 


Let’s summarize some of the above observations with two definitions. 
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Definition 2 (Recursive approach) A recursive approach to a problem consists of two 
parts: 


1. The problem is reduced to one or more problems of the same kind which are simpler 
in some sense. 


2. There is a set of simplest problems to which all others are reduced after one or more 
steps. Solutions to these simplest problems are given. 


The preceding definition focuses on tearing down (reduction to simpler cases). Sometimes 
it may be easier or better to think in terms of building up (construction of bigger cases): 


Definition 3 (Recursive solution) We have a recursive solution to the problem (proof, 
algorithm, data structure, etc.) if the following two conditions hold. 


1. The set of simplest problems can be dealt with (proved, calculated, sorted, etc.). 


2. The solution to any other problem can be built from solutions to simpler problems, 
and this process eventually leads back to the original problem. 


The recursion C(n, k) = C(n—1,k—1)+C(n-—1,k) for computing binomial coefficients 
can be viewed as a recursive algorithm. Such algorithms for computing can be turned into 
algorithms for constructing the things we are counting. To do this, it helps to have a more 
systematic way to think about recursive algorithms. In the next example we introduce a 
tree to represent the local description of a recursive algorithm. 


Example 10 (Permutations in lex order) The following figure represents the local 
description of a decision tree for listing the permutations of an ordered set 


S = {[SisSaj2<8nt with sj <s9<-:- <8. 


The permutations in the figure are listed in one-line form. The vertices of this decision tree 
are of the form L(X) where X is some set. The simplest case, shown below, is where the 
tree has one edge. The labels on the edges are of the form (t), where t is an element of the 
set X associated with the uppermost vertex L(X) incident on that edge. 


_ tail 
L({s}) 
S—{si}) L(S—{So}) S—{Sn}) 


The leaves of the recursive tree tell us to construct permutations of the set S with the already 
chosen element removed from the set. (This is because permutations are injections.) 
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One way to think of the local description is to regard it as a rule for recursively 
constructing an entire decision tree, once the set 5S is specified. Here this construction has 
been carried out for S = {1, 2,3, 4}. 


L(1,2,3,4) 


To obtain a permutation of 4, read the labels (t) on the edges from the root to a particular 
leaf. For example the if this is done for the preorder first leaf, one obtains (1)(2)(3)L(4). 
L(4) is a “simplest case” and has the label (4), giving the permutation 1234 in one line no- 
tation. Repeating this process for the leaves from left to right gives the list of permutations 
of 4 in lex order. For example, the tenth leaf gives the permutation 2341. 


We’ll use induction to prove that this is the correct tree. When n = 1, it is clear. 
Suppose it is true for all S with cardinality less than n. The permutations of S' in lex 
order are those beginning with s, followed by those beginning with sg and so on. If s;, 
is removed from those permutations of S beginning with s;, what remains is the set of 
permutations of S — {s;} listed in lex order. By the induction hypothesis, these are given 
by L(S — {s,}). Note that the validity of our proof does not depend on how they are given 
by L(S —{sx}). O 


No discussion of recursion would be complete without the entertaining example of the 
Towers of Hanoi puzzle. We shall explore additional aspects of this problem in the exercises. 
Our approach will be the same as the previous example. We shall give a local description 
of the recursion. Having done so, we construct the trees for some examples and try to gain 
insight into the sequence of moves associated with the general Towers of Hanoi problem. 


Example 11 (Towers of Hanoi) The Towers of Hanoi puzzle consists of n different 
sized washers (i.e., discs with holes in their centers) and three poles. Initially the washers 
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are stacked on one pole as shown below. 


ed 
-to 
-td 


The object is to switch all of the washers from the pole S to G using pole E as a place 
for temporary placement of discs. A legal move consists of taking the top washer from a 
pole and placing on top of the pile on another pole, provided it is not placed on a smaller 
washer. Configuration (a), above, is the starting configuration, (b) is an intermediate stage, 
and (c) is illegal. 


We want an algorithm H(n, S, E, G) that takes washers numbered 1,2,...,n that are 
stacked on the pole called S and moves them to the pole called G. The pole called E is also 
available. A call of this procedure to move 7 washers might be H(7, “start” , “extra”, “goal” ). 

Here is a recursive description of how to solve the Towers of Hanoi. To move the 
largest washer, we must move the other n — 1 to the spare peg. After moving the largest, 
we can then move the other n — 1 on top of it. Let the washers be numbered 1 to n from 
smallest to largest. When we are moving any of the washers 1 through k, we can ignore 
the presence of all larger washers beneath them. Thus, moving washers 1 through n — 1 
from one peg to another when washer n is present uses the same moves as moving them 
when washer n is not present. Since the problem of moving washers 1 through n — 1 is 
simpler, we practically have a recursive description of a solution. All that’s missing is the 
observation that the simplest case, n = 1, is trivial. The following diagram gives the 
local description of a decision tree that represents this recursive algorithm. 


H(n, X, Y, Z) 
H(1, X, Y, Z) 


H(n-1, X, Z, Y) x7 H(n-1, Y, X, Z) 


The “simplest case,” n equals 1, is shown on the left. The case for general n is designated 
by H(n, X, Y, Z). You can think of the symbol H(n, X, Y, Z) as designating a vertex of 
a decision tree. The local description tells you how to construct the rest of the decision 
tree, down to and including the simplest cases. There is a simple rule for deciding how to 
rearrange X, Y and Z: 
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e for the left child: X is fixed, Y and Z are switched; 
e for the right child: Z is fixed, X and Y are switched. 


All leaves of the decision tree are designated by symbols of the form “U KV” This symbol 
has the interpretation “move washer number k from pole U to pole V.” These leaves in 
preorder (left to right order in the tree) give the sequence of moves needed to solve the 
Towers of Hanoi puzzle. The local description tells us that, in order to list the leaves of 
H(n, 5S, E, G), we 


e list the leaves of H(n-1, S, G, E), moving the top n-1 washers from S to E using G 
e move the largest washer from S to G 
e list the leaves of H(n-1, E, S, G), moving the top n-1 washers from E to G using S 


For example, the leaves of the tree with root H(2,5,E,G) are, in order, 
SSE, S4G, ESG 


The leaves of H(3,5,E,G) are gotten by concatenating (piecing together) the leaves of 


the subtree rooted at H(2,5,G,E) with S +, G and the leaves of the subtree rooted at 
H(2,E,S,G). This gives 


S36 548, GSE,53GC8E45,E43¢,55¢ oO 


Example 12 (The Towers of Hanoi decision tree for n = 4) Starting with the local 
description of the general decision tree for the Towers of Hanoi and applying the rules of 
construction specified by it, we obtain the decision tree for the Towers of Hanoi puzzle with 
n = 4. For example, we start with n= 4, X =S, Y =E and Z= G at the root of the tree. 
To match the H(n,X,Y,Z) pattern when we expand the rightmost son of the root (namely 
H(3,E,S,G)), we haven = 3, X=E, Y=S and Z=G. 


H(4, S, E, G) 
H(3, S, G, E) H(3, E, S, G) 
H(2, S, E, G) Ss +E H(2, G, S, E) H(2, E, G, S) H(2, S, E, G) 
s2.e pare E2,5 s2.c6 
H(1, S, G, E) H(1, E, S, G) a H(1,S, G, E) H(1, E, S, G) a “y H(1, E, S, G) 


Section 2: Recursive Algorithms 


There are fifteen leaves. You should apply sequentially the moves specified by these leaves 
to the starting configuration ((a) of Example 11) to see that the rules are followed and the 
washers are all transferred to G. 


There are some observations we can make from this example. There are 24 — 1 = 15 
leaves or, equivalently, “moves” in transferring all washers from S to G. If hy is the number 
of moves required for n washers, then the local description of the decision tree implies that, 
in general, hy, = 2hy,_1 +1. Computing some numbers for the hy, gives 1, 3, 7, 15, 31, etc. 
It appears that h, = 2” — 1. This fact can be proved easily by induction. 


Note that the washer number 1, the smallest washer moves every other time. It moves 
in a consistent pattern. It starts on S, then E, then G, then S, then E, then G, etc. For 
H(3, S, E, G), the pattern is S, G, E, S, G, E, etc. In fact, for n odd, the pattern is always 
5S, G, E, S, G, E, etc. For n even, the pattern is always S, E, G, S, E, G, etc. This means 
that if someone shows you a configuration of washers on discs for the H(n, S, E, G) and 
says to you, “It’s the smallest washer’s turn to move,” then you should be able to make 
the move. If they tell you it is not the smallest washer’s turn, then you should also be able 
to make the move. Why? Only one move not involving the smallest washer is legal! 0 


Example 13 (The Towers of Hanoi, recursion, and stacks) One way to generate 
the moves of H(n, S, E, G) is to use the local description to generate the depth first vertex 
sequence (Example 5). 


e The depth first vertex list for n = 4 would start as follows: 


H(4, $, E, G), H(3,$,G,E), H(2,8,E, G), H(1,$,G,E), S3E 


At this point we have gotten to the first leaf. It should be printed out. 
e The next vertex in the depth first vertex sequence is H(1, S$, G, E) again. We represent 
this by removing S 4. E to get 


H(4, S, E, G), H(3, S, G, E), H(2, S, Es, G), ELC S, G, Ee 


e Next we remove H(1, S, G E) to get 


H(4, 8, E, G), H(3, 8, G, E), H(2, S, E, G). 


e The next vertex in depth first order is S$ 4, G. We add this to our list to get 


H(4, S, E, G), H(3, $, G, E), H(2, $, E, G),S 4G. 


Continuing in this manner we generate, for each vertex in the decision tree, the path from 
the root to that vertex. The vertices occur in depth first order. Computer scientists refer 
to the path from the root to a vertex v as the stack of v. Adding a vertex to the stack 
is called pushing the vertex on the stack. Removing a vertex is popping the vertex from 
the stack. Stack operations of this sort reflect how most computers carry out recursion. 
This “one dimensional” view of recursion is computer friendly, but the geometric picture 
provided by the local tree is more people friendly. O 
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Example 14 (The Towers of Hanoi configuration analysis) In the figure below, we 
show the starting configuration for H(6, S, E, G) and a path, 


H(6, S, E, G), H(5, E,S, G), H(4, E, G, 8), H(3,G,E,$), G48. 
This path goes from the root H(6, S, E, G) to the leaf G 4. §. Given this path, we want to 


construct the configuration of washers corresponding to that path, assuming that the move 


G4 S has just been carried out. This is also shown in the figure and we now explain how 
we obtained it. 


1 2, 
STARTING CONFIGURATION—-~- — 6 
S) 


E G 
H(6, S, E, G) 
1 
6 = : —— hgLeaves 1 Leaf H(5.E, S, G) 


1 - 
2 
3 

= ——e H(4, E, G, S) 


1 ee | ee 
2 
3 
o——s4 Ge i 6 hgLeaves 1 Leaf 

H(3, G, E, S) 

S) E G | 

i 3 
— nl ha Leaves G—>S§ 
S E G MOVE JUST MADE 
ENDING CONFIGURATION RANK 43 

NUMBER 44 


The first part of the path shows what happens when we use the local tree for H(6,5,E,G). 
Since we are going to H(5,E,S,G), the first edge of the path “slopes down to the right.” 
At this point, the left edge, which led to H(5,S,G,E) moved washers 1 through 5 to pole E 


using h; = 2° — 1 = 31 moves and move S . G has moved washer 6. This is all shown in 
the figure and it has taken 31 + 1 = 32 moves. 


Next, one replaces H(5,E,S,G) with the local tree (being careful with the S, E, G 
labels!). This time the path “slopes to the left”. Continuing in this manner we complete 
the entire path and have the configuration that is reached. 


We can compute the rank of this configuration by noticing how many moves were 
made to reach it. Each move, except our final one G = S, is a leaf to the left of the leaf 


corresponding to the move G 4. § at the end of the path. We can see from the figure that 
there were 


(ei) 04 ig FD) hig = Gi 4 1) 4-7 1) 4-9 = 45 


such moves and so the rank of this configuration is 48. 
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You should study this example carefully. It represents a very basic way of studying 
recursions. In particular 


(a) You should be able to do the same analysis for a different path. 
(b) You should be able to start with an ending configuration and reconstruct the path. 


(c) You should be able to start with a configuration and, by attempting to reconstruct 
the path (and failing), be able to show that the configuration is illegal (can never 
arise). 


We've already discussed (a). When you know how to do (b), you should be able to do 
(c). How can we reconstruct the path from the ending configuration? Look at the final 
configuration in the previous figure and note where the largest washer is located. Since 
it is on G, it must have been moved from S to G. This can only happen on the middle 
edge leading out from the root. Hence we must take the rightmost branch and are on the 
path that starts H(5,E,S,G). We are now dealing with a configuration where washers 1-5 
start out stacked on E, washer 6 is on G and washer 6 will never move again. This takes 
care of washer 6 and we ignore it from now on. We are now faced with a new problem: 
There are 5 washers starting on E. In the final configuration shown in the figure, washer 5 
is still on E, so it has not moved in this new problem. Therefore, we must take leftmost 
branch from H(5,E,S,G). This is H(4,E,G,S). Again, we have a new problem with 4 washers 
starting out on E. Since washer 4 must end up on S, we take the rightmost branch out of 
the vertex H(4,E,G,S). We continue in this manner until we reach H(1,...). If washer 1 
must move, that is the last move. Otherwise, the last move is the leaf to the left of the last 
right-pointing branch that we took. In our particular case, from H(4,E,G,S) we go right to 
H(3,G,E,S), right to H(2,E,G,S), left to H(1,E,S,G). Since washer 1 is on E, it has not yet 
moved in doing H(1,E,S,G). (Of course, it may have moved several times earlier.) The last 
right branch was H(2,E,G,S) from H(3,G,E,S) so that the last move was washer 3 from G 
to S. 


How could the previous process ever fail? Not all configurations arise. For example, if 
washer 5 were on S, we would have decided to move it in H(5,E,S,G) since it is not on E. 
But the only time H(5,E,S,G) moves washer 5 is from E to G so it cannot end upon S. OJ 


We conclude this section with another “canonical” example of recursive algorithm and 
decision trees. We want to look at all subsets of n. It will be more convenient to work with 
the representation of subsets by functions with domain n and range {0,1}. For a subset S 


of n, define 
. fl ifieS, 

xs) = ie fig S. 
This function is called the characteristic function of S. We have a characteristic function 
for every subset S' of n. In one line notation, these functions become n-strings of zeroes and 
ones: The string a, ...a@, corresponds to the subset J’ where i € T if and only if a; = 1. 
Thus the all zeroes string corresponds to the empty set and the all ones string to n. This 
correspondence is called the characteristic function interpretation of subsets of n. 


Our goal is to make a list of all subsets of n such that subsets adjacent to each other 
in the list are “close” to each other. Before we can begin to look for such a Gray code, 
we must say what it means for two subsets (or, equivalently, two strings) to be close. Two 
strings will be considered close if they differ in exactly one position. In set terms, this means 
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one of the sets can be obtained from the other by removing or adding a single element. 
With this notion of closeness, a Gray code for all subsets when n = 1 is 0, 1. A Gray code 
for all subsets when n = 2 is 00, 01, 11, 10. 


How can we produce a Gray code for all subsets for arbitrary n? There is a simple 
recursive procedure. The following construction of the Gray code for n = 3 illustrates it. 


0 00 110 
0 01 111 
011 101 
0 10 1 00 


You should read down the first column and then down the second. Notice that the sequences 
in the first column begin with 0 and those in the second with 1. The rest of the first column 
is simply the Gray code for n = 2 while the second column is the Gray code for n = 2, read 
from the last sequence to the first. 


We now prove that this two column procedure for building a Gray code for subsets of 
an n-set from the Gray code for subsets of an (n — 1)-set always works. Our proof will be 
by induction. For n = 1, we have already exhibited a Gray code. Suppose that n > 1 and 
that we have a Gray code for n — 1. (This is the induction assumption.) 


e Between the bottom of the first column and the top of the second, the only change 
is in the first position since the remaining n — 1 positions are the last element of our 
Gray code for n — 1. 


e Within a column, there is never any change in the first position and there is only a 
single change from line to line in the remaining positions because they are a Gray code 
by the induction assumption. 


This completes the proof. 


As an extra benefit, we note that the last element of our Gray code differs in only one 
position from the first element (Prove it!), so we can cycle around from the last element to 
the first by a single change. 


Example 15 (Decision tree for the subset Gray code) Here is another notation for 
describing our subset Gray code. Let GRAY(1) = 0,1, and let GRAY(1) = 1,0. As the 
arrows indicate, GRAY (1) is the Gray code for n = 1 listed from first to last element, while 
<———— ss 

GRAY (1) is this Gray code in reverse order. In general, if GRAY(n) is the Gray code for 
n-bit words, then GRAY(n) is defined to be that list in reverse order. 


We define 


—— > < 
GRAY (2) = OGRAY(1), 1GRAY(1). 


The meaning of OGRAY(1) is that 0 is put at the front of every string in GRAY(1). 
Juxtaposing the two lists (or “concatenation” ) means just listing the second list after the 


———> <-— 
first. Thus, OGRAY(1) = 00,01, and 1GRAY(1) = 11,10. Hence, 


Fee > $ 
GRAY (2) = OGRAY(1), 1IGRAY(1) = 00,01, 10, 11. 
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res — 
If we read GRAY (2) in reverse order, we obtain GRAY(2). You should verify the following 
equality. 


—— > < 
GRAY (2) = 1GRAY(1), OGRAY(1). 


What we did for n = 2 works in general: The following diagram gives the local de- 
scription of a decision tree for constructing subset Gray codes: 


GRAY(n GRAY(n 
OGRAY(n-1) {GRAY (n-1) {GRAY(n-1) OGRAY(n-1) 
GRAY (1) GRAY(1) 
0 1 1 0 


eee: 
The left side of the figure is a definition for GRAY(n). We must verify two things: 


e This definition gives us a Gray code. 


<—— ———-> ; 
e Given the left figure and the fact that GRAY(n) is the reversal of GRAY(n), the right 
figure is correct. 


The first part was already done because the figure simply describes the construction we 

gave before we started this example. The second part is easy when we understand what 

the tree means. Reading the GRAY(n) tree from the right, we start with the reversal of 
fe 

1GRAY(n — 1). Since GRAY and GRAY are defined to be reversals of each other, we get 
eee ; ad ; oo 

1GRAY(n — 1). Similarly, reversing OGRAY(n — 1) gives OGRAY(n — 1). 


If we apply the local description to the case n = 3, we obtain the following decision 
tree: 


OGRAY(2 1GRAY( 2) 


LS Ls 


00GF RAY (1 O1GF GRAY (1 11GF 1GRAY(1 10GF OGRAY(1 


JN IN YN IN 


000 001 9011 010 110 111 101 100 


In the above decision tree for GRAY(3), the elements of the Gray code for n = 3 are 
obtained by listing the labels on the edges for each path that ends in a leaf. These paths 
are listed in preorder of their corresponding leaves (left to right in the picture). This gives 
000, 001, 011, 010, 110, 111, 101, 100. You should practice doing the configuration analysis 
for this recursion, analogous to Example 14. In particular, given a sequence, 10011101 say, 
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es 
construct its path in the decision tree. What is the RANK of 10011101 in GRAY(8)? What 
is the element in GRAY(8) just before 10011101; just after 10011101? 


Note in the above decision tree for GRAY(3) that every time an edge with label 1 is 
encountered (after the first such edge), that edge changes direction from the edge just prior 
to it in the path. By “changing direction,” we mean that if an edge is sloping downward 
to the right (downward to the left) and the previous edge in the path sloped downward 
to the left (downward to the right), then a change of direction has occurred. Conversely, 
every time an edge with label 0 is encountered (after the first such edge), that edge does 
not change direction from the edge just prior to it in the path. This is a general rule that 
can be proved by induction. 0 


Exercises for Section 2 


2.1. Suppose the permutations on 8 are listed in lexicographic order. 
(a) What is the RANK in the list of all such permutations of 87612345? 
(b) What permutation has RANK 20,160? 


2.2. Consider the Towers of Hanoi puzzle, H(8,5, E, G). Suppose that pole S has washers 
6, 5, 2, 1; pole E has no washers; pole G has washers 8, 7, 4, 3. Call this the basic 
configuration. 


(a) What is the path in the decision tree that corresponds to the basic configura- 
tion? 


(b) What was the move that produced the basic configuration and what was the 
configuration from which that move was made? 


(c) What was the move just prior to the one that produced the basic configuration 
and what was the configuration from which that move was made? 


(d) What will be the move just after the one that produced the basic configuration? 


(e) What is the RANK, in the list of all moves of H(8, 5S, E, G), of the move that 
produced the basic configuration? 


2.3. Consider GRAY(9). 
(a) What is the element just before 110010000? just after 110010000? 
(b) What is the first element of the second half of the list? 
(c) What is the RANK of 111111111? 
(d) What is the element of RANK 372? 


*2.4. Consider the Towers of Hanoi puzzle with four poles and n washers. The rules 
are the same, except that there are two “extra” poles E and F. The problem is to 
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transfer all of the n washers from S to G using the extra poles E and F as temporary 
storage. Let h/, denote the optimal number of moves needed to solve the three pole 
problem. Let f,, denote the optimal number of moves needed to solve the four pole 
problem with n washers. 


(a) Recall that h,, = 2” — 1 is the number of moves in the recursive algorithm 
H(n, S, E, G). Prove by induction that hi, = hy. 


(b) Compute f, for n = 1,2,3, describing, in the process, optimal sequences of 
moves. 


Let’s adopt a specific strategy for doing four poles and n washers. Choose integers 
p>0Oandq>0so that p+q=n. We now describe strategy G(p, q, S, E, F, G). 
To execute G(p, g, 8, E, F, G), proceed as follows: 


(i) If p=0, then g=n. Use H(n, S, E, G) to move washers 1,...,n to G. 


(ii) If p > 0, choose integers i > 0 and j > O such that i+ 7 = p. Use 
G(i, 7, 5, E, G, F) to move washers 1,2,...,p to pole F (the washers are num- 
bered in order of size). Next, use H(q, S, E, G) to move washers q,...,n to G. 
Finally, use G(i, 7, F, S, E, G) to move 1,2,...,p to pole G, completing the 
transfer. For all possible choices of ¢ and 7, choose the one that minimizes the 
number of moves. 


Finally, to move the n washers, choose that G(p, g, 5, E, F, G) with n = p+q 
which has the minimum number of moves. Call this number s,,. 


(c) What are the simplest cases in this recursive algorithm? How can you compute 
the values of i and 7 to minimize the number of moves? Use your method to 
solve the problem for n < 6. 


(d) What is the recursion for s,,? 


(e) Prove that f, < 2min(fn—q + hq), where the minimum is over g > 0 and 
9 = O. recursion. 


Section 3: Decision Trees and Conditional Probability 


We conclude our discussion of decision trees by giving examples of the use of decision trees 
in elementary probability theory. In particular, we focus on what are called conditional 
probabilities and Bayesian methods in probability. 


Definition 4 (Conditional probability) Let U be a sample space with probability 
function P. If AC U and B CU are events (subsets) of U then the conditional probability 
of B given A, denoted by P(B|A), is 


_ f P(ANB)/P(A), if P(A) 40, 
PA) = { undefined, WPCA) =0, 
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How should we interpret P(B|A)? If an experiment is performed n times and 6 of those 
times B occurs, then b/n is nearly P(B). Furthermore, as n increases, the ratio b/n almost 
surely approaches P(B) as a limit.* Now suppose an experiment is performed n times 
but we are only interested in those times when A occurs. Furthermore, suppose we would 
like to know the chances that B occurs, given that A has occurred. Let the count for A 
be a and that for AN B be c. Since we are interested only in the cases when A occurs, 
only a of the experiments matter. In these a experiments, B occurred c times. Hence the 
probability that B occurs given that A has occurred is approximately c/a = (c/n)/(a/n), 
which is approximately P(AN B)/P(A), which is the definition of P(B|A). As n increases, 
the approximations almost surely approach P(B|A). Hence 


P(B|A) should be thought of as the probability that B occurred, 
given that we know A occurred. 


Another way you can think of this is that we are changing to a new sample space A. To 
define a probability function P,4 on this sample space, we rescale P so that }),<-4 Pa(a) = 1. 
Since )),¢4 P(a) = P(A), we must set P4(a) = P(a)/P(A). Then, the probability that B 
occurs is the sum of P4(a) over all a € B that are in our new sample space A. Thus 


P4(B)= > Pala)= S> Pla)/P(A) = P(ANB)/P(A), 


a€ANB ac ANB 
which is our definition of P(B|A). 


The following theorem contains some simple but important properties of conditional 
probability. 


Theorem 2 (Properties of conditional probability) Let (U, P) be a probability 
space. All events in the following statements are subsets of U and the conditional proba- 
bilities are assumed to be defined. (Recall that P(C|D) is undefined when P(D) = 0.) 


(a) P(B|U) = P(B) and P(B|A) = P(ANB| A). 

(b) A and B are independent events if and only if P(B|A) = P(B). 

(c) (Bayes’ Theorem) P(A|B) = P(B|A)P(A)/P(B). 

(d) P(A, N---N An) = P(A1) P(A2 | Ar) P(Asz | Ar Ag) +--+ P(An | A+++ An-1). 


You can think of (b) as a justification for the terminology “independent” since P(B|A) = 
P(B) says that the probability of B having occurred is unchanged even if we know that A 
occurred; in other words, A does not influence B’s chances. We will encounter other forms 
of Bayes’ Theorem. All of them involve reversing the order in conditional probabilities. 
(Here, A|B and B|A.) 


Proof: All the proofs are simple applications of the definition of conditional probability, 
so we prove just (b) and (d) and leave (a) and (c) as exercises. 


* For example, we might toss fair coin 100 times and obtain 55 heads, so a/n; = 55/100 is 
nearly 1/2 = P(head). With 10,000 tosses, we might obtain 4,930 heads and 4,930/10,000 
is even closer to 1/2 than 55/100. (This is the sort of accuracy one might realistically 
expect.) 
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We prove (b). Suppose A and B are independent. By the definition of independence, 
this means that P(ANM B) = P(A)P(B). Dividing both sides by P(A) and using the 
definition of conditional probability, we obtain P(B|A) = P(B). For the converse, suppose 
P(B|A) = P(B). Using the definition of conditional probability and multiplying by P(A), 
we obtain P(AM B) = P(A)P(B), which is the definition of independence. 


We prove (d) simply by using the definition of conditional probability and doing a lot 
of cancellation of adjacent numerators and denominators: 


P(A,) P(A | A1) P(A3 | Ar 9 Az) +++ P(An | A+++ An—1) 
ee pea) P(A, NM Ag MN As) 8, P(A, N---N An) 
eae 6 eh P(A, Ag) POG fidget) 


= P(A, N---N Ap). 


This completes the proof. 


An alternative proof of (d) can be given by induction on n. For n = 1, (d) becomes 
P(A) = P(A)), which is obviously true. For n > 1 we have 


PAN PAs AD PA | A A PG aie) 
= (P(Ay) PCAs Ay AP AS Apres An—2))P(An [Arretiage) 
= P(A, N---N An-1)P(An | A1N--+M An-1) by induction 
= P(A, N---NAn) Definition 4. 


This completes the proof. J 


Example 16 (Diagnosis and Bayes’ Theorem) Suppose we are developing a test to 
see if a person has a disease, say the dreaded wurfles. It’s known that 1 person in about 
500 has the wurfles. To measure the effectiveness of the test, we tried it on a lot of people. 
Of the 87 people with wurfles, the test always detected it, so we decide it is 100% effective 
at detection. We also tried the test on a large number of people who do not have wurfles 
and found that the test incorrectly told us that they have wurfles 3% of the time. (These 
are called “false positives.” ) 


If the test is released for general use, what is the probability that a person who tests 
positive actually has wurfles? 


Let’s represent our information mathematically. Our probability space will be the 
general population with the uniform distribution. The event W will correspond to having 
wurfles and the event T will correspond to the test being positive. Our information can be 
written 


P(W) =1/500=0.002 P(T|IW)=1 = P(T|W*) = 0.03, 


and we are asked to find P(W|T). Bayes’ formula (Theorem 2(c)) tells us 


pawn) = PE EW) 
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Everything on the right is known except P(T). How can we compute it? The idea is to 
partition JT’ using W and then convert to known conditional probabilities: 


P(T) = P(TOW)+P(TONW*) partition T 
= P(T|W)P(W) + P(T|W°) P(W*) convert to conditional 
= 1 x 0.002 + 0.03 x (1 — 0.002) = 0.032, 


where we have rounded off. Thus P(W|T) = 1 x 0.002/0.032 = 6%. In other words, even if 
the test is positive, you only have a 6% chance of having wurfles. This shows how misleading 
a rather accurate test can be when it is used to detect a rare condition. JJ 


Example 17 (Decision trees and conditional probability) We can picture the 
previous example using a decision tree. We start out with the sample space U at the root. 
Since we have information about how the test behaves when wurfles are present and when 
they are absent, the first decision partitions U into W (has wurfles) and W* (does not have 
wurfles). Each of these is then partitioned according to the test result, T (test positive) 
and T° (test negative). Each edge has the form (A, B) and is labeled with the conditional 
probability P(B|A). The labels P(W) and P(W°) are equal to P(W|U) and P(W*°|U) 
respectively (by Theorem 2(a)). Here is the decision tree for our wurfles test. 


U 
A 
é 
& CZ 
aS 
Ww w° 
8) ~/\* 
—~ r. o Tt 
of \ s \%s 
s a & S 
Q Zz, gz ° 
T Te r Tr 


TOW TNW TOW TI\We 


If you follow the path (U,W,T), your choices were first W (has wurfles) then T' (tests 
positive). In terms of sets, these choices correspond to the event (i.e., set) TM W of all 
people who both test positive and have wurfles. Accordingly, the leaf that is at the end of 
this path is labeled with the event TW. Similar “event” labels are placed at the other 
leaves. 


Using the definition of conditional probability, you should be able to see that the 
probability of the event label at a vertex is simply the product of the probabilities on the 
edges along the path from the root to the vertex. For example, to compute P(T’M W°) 
we multiply P(W°) and P(T|W°) = P(TNW°)/P(W°). Numerically this is 0.998 x 
0.03 + 0.03. To compute P([T MW) we multiply P(W) and P(T|W). Numerically this is 
0.002 x 1.0 = 0.002. Here is the tree with the various numerical values of the probabilities 
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shown. 


0.002 0.998 


Ww we 
/\ 0.03 0.97 
T Te T Te 
TNW TAW TNWe THAWE 


Using the above tree and the computational rules described in the previous paragraph, 
we can compute P(W|T) = P(WNT)/P(T) = P([TNW)/P(T) as follows. 


1. Compute P(T) by adding up the probabilities of the event labels of all leaves 
that are associated with the decision T. (These are the event labels T™W and 
TOW.) Thus, P(T) = P(TAW)+ P(TOAW*). Using the actual probabilities 
on the edges of the decision tree we get P(T’) = 0.002 x 1.0 + 0.998 x 0.03 = 0.032. 


2. Compute P(W|T) using P(W|T) = P(T AW )/P(T). Using the computation in 
step (1), we get P(W|T) = P(TNW)/P(T) = (0.002 x 1.0)/0.032 + 0.06 These 
are the same calculations we did in the previous example, so why go to the extra 
trouble of drawing the tree? The tree gives us a systematic method for recording 
data and carrying out the calculations. OJ 


In the previous example, each vertex was specifically labeled with the event, such as 
W T°, associated with it. In the next example, we simply keep track of the information 
we need to compute our answer. 


Example 18 (Another decision tree with probabilities) We are given an urn with 
one red ball and one white ball. A fair die is thrown. If the number is 1, then 1 red ball 
and 2 white balls are added to the urn. If the number is 2 or 3, then 2 red balls and 3 
white balls are added to the urn. If the number is 4, 5, or 6, then 3 red balls and 4 white 
balls are added to the urn. A ball is then selected uniformly at random from the urn. We 
represent the situation in the following decision tree. 


[1R, 1W] 


[2R, 3W] [4R, 5W] 


[1R, 3W] [2R, 2W] [2R, 4W] [SR, 3W] —[3R, 5W] [4R, 4W] 
1/15 110 3911/7 4/21 2/9 5/18 
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The root is represented by the initial composition of the urn. The children of the root 
[1R,1W] are [2R,3W], [8R,4W], and [4R,5W]. Beside each of these children of the root 
is the outcome set of the roll of the die that produces that urn composition: {1}, {2,3}, 
{4,5,6}. The probabilities on the edges incident on the root are the probabilities of the 
outcome sets of the die. The probabilities on the edges incident on the leaves are the con- 
ditional probabilities as discussed in Example 17. Thus, 3/7 is the conditional probability 
that the final outcome is R, given that the outcome of the die was in the set {2,3}. 


Here is a typical sort of question asked about this type of probabilistic decision tree: 
“Given that the ball drawn was red, what is the probability that the outcome of the die was 
in the set {2,3}.” We could write this mathematically as P({2,3} | R), where {2,3} repre- 
sents the result of rolling the die and R represents the result of the draw. Note in this process 
that the basic data given are conditional probabilities of the form P(drawing is R | die in S). 
We are computing conditional probabilities of the form P(die roll in S | drawing is R). This 
is exactly the same situation as in Example 17. Thus our question is answered by carrying 
out the two steps in Example 17: 


1. Add up the probabilities of all leaves resulting from the drawing of a red ball to 
obtain P(R) = 1/15+ 1/74 2/9 = 136/315. (The probabilities of the leaves were 
computed by multiplying along the paths from the root. The results for all leaves 
are shown in the picture of the decision tree.) 


2. Compute the conditional probability P({2,3} | R) by dividing P({2,3}MR) = 
1/7 by P(R). Divide this by the answer from part (1). In this case, we get 
(1/7) /(136/315) = 0.331. 


If you wish, you can think of this problem in terms of a new sample space. The 
elements of the sample space are the leaves. Step 1 (multiplying probabilities along paths) 
computes the probability function for this sample space. Since an event is a subset of the 
sample space, an event is a set of leaves and its probability is the sum of the probabilities 
of the leaves it contains. Can we interpret the nonleaf vertices? Yes. Each such vertex 
represents an event that consists of the set of leaves below it. Many people prefer this 
alternative way of thinking about the decision tree. 0 


The procedure we used to compute conditional probabilities in Steps 1 and 2 of two 
previous examples can be stated as a formula, which is another form of Bayes’ Theorem: 


Theorem 3 (Bayes’ Theorem) Let (U, P) be a probability space, let 
{A;:i1=1,2,...,n} bea partition of U, and let BC U. Then 


P(A) P(BIA,) 
PAIB) = 0 B(A)P(BIA) 


Most students find decision trees much easier to work with than trying to apply the formal 
statement of Bayes’ theorem. Our proof will closely follow the terminology of Example 17. 


Proof: We can draw a decision tree like the ones in the previous examples, but now there 
are n edges of the decision tree coming down from the root and 2 edges coming down from 
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each child of the root. Here is a decision tree for this generalization: 


A\B  AA~B As/\B Aa/\~B Ai NB A, N~B AW\B  An\~B 


We follow the two step process of Example 17. In doing this, we need to compute, for 
1<t<vn, the products of the probabilities along the path passing through the vertex A; 
and leading to the leaves labeled by the events 4; 7. B= BN A. 


1. Add up the probabilities of all leaves contained in B, i.e., add up P(A;)P(B|A;) 
over 1 <t <7 to obtain P(B). 

2. Compute P(A;|B) = P(A; B)/P(B). Since P(A; B) = P(A;)P(B|A;), this 
quotient is the formula in the theorem. 


This process gives the formula in the theorem. 


All of our probabilistic decision trees discussed thus far have had height two. However, 
probabilistic decision trees can have leaves at any distance from the root and different 
leaves may be at different distances. The two step procedure in Example 17 contains no 
assumptions about the height of leaves and, in fact, will work for all trees. The next 
example illustrates this. 


Example 19 (Tossing coins) Suppose you have two coins. One has heads on both sides 
and the other is a normal coin. You select a coin randomly and toss it. If the result is 
heads, you switch coins; otherwise you keep the coin you just tossed. Now toss the coin 
you're holding. What is the probability that the result of the toss is heads? Here is the 
decision tree. 


1/2 1/2 


H:ht H:hh T:ht 


i“ 
H T H H T 
1/4 1/4 1/4 1/8 1/8 


The labels hh and ht indicate which coin you’re holding — two headed or normal. The 
labels H and T indicate the result of the toss. A label like H:ht means the toss was H and 
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so I am now holding the ht coin. The conditional probabilities are on the edges and the leaf 
probabilities were computed by multiplying the probabilities along the paths, as required 
by Step 1. Adding up, we find that the probability of heads is 1/4 +1/4+1/8 = 5/8. 


Given that the final toss is heads, what is the probability that you’re holding the 
double-headed coin? The leaf where you’re holding the double headed coin and tossed a 
head is the middle leaf, which has probability 1/4, so the answer is (1/4)/(5/8) = 2/5. 


Given that the final toss is heads, what is the probability that the coin you picked up 
at the start was not double headed? This is a bit different than what we’ve done before 
because there are two leaves associated with this event. Since the formula for conditional 
probability is 


P((chose ht) M (second toss was H)) 
P(second toss was H) 


’ 


we simply add up the probability of those two leaves to get the numerator and so our 
answer is (1/4 + 1/8)/(5/8) = 3/5. 


In the last paragraph we introduced a generalization of our two step procedure: If an 
event corresponds to more than one leaf, we add up the probability of those leaves. J 


Example 20 (The Monty Hall Problem—Goats and Cars) The Monty Hall problem 
is loosely based on the television game show “Let’s Make a Deal.” A common statement of 
the problem is that a contestant is shown three doors. There is a new car behind one door 
and a goat behind each of the other two. The contestant first chooses a door but doesn’t 
open it. The game show host, Monty Hall, then opens a different door which invariably 
reveals a goat since he knows the location of the car. He then gives the contestant the 
opportunity to switch her choice to a different door or remain with her original choice. She 
will be given whatever is behind the final door she chooses. 


It is best, intuitively, to consider two strategies: never switch and always switch. One 
can then consider the mixed strategy of sometimes switching and sometimes not. It also 
helps our understanding to make the problem a little more general. 


Suppose there are n doors (instead of 3), one of which hides a car. Suppose that Monty 
Hall opens k doors (instead of 1) to reveal goats. We need 1 < k < n — 2 for the problem 
to be interesting. You should think about why k = 0 and k = n — 1 are not interesting. 


Consider first the case where the contestant never switches. The probability that the 
car will be behind the chosen door is 1/n since the contestant has no idea which door hides 
the car. Thus, 1/n is the probability of winning the car each time a contestant who never 
switches plays the game. If she plays the game 1000 times, she would expect to win about 
1000/n times. 
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Now consider someone who always switches. The next figure should help. 


There are n doors, a car behind one and goats behind the rest. 


Always Switch 


k,1< k< n-—2, doors opened, revealing goats 


As shown in the above figure, the contestant’s first choice is a car with probability 1/n or 
a goat with probability (n — 1)/n. The host opens k doors from among the n — 1 doors 
not chosen by the contestant. If the contestant’s first choice was a car, she must lose when 
she switches. Otherwise, one of the other unopened doors hides a car. How many of these 
unopened doors are there? The contestant picked 1 and Monty opened k, so there are 
n —(k-+ 1). Since each unopened door is equally likely to hide the car, the chances of 
winning (pointing to the car) in this case are 1/(n—(k+1)). Of course we must remember 
that this is a conditional probability: The probability of winning given that the first choice 
was a goat. Thus the probability of winning is 


n—-1l 1 n—-1 1 1 


Thus it is better, on average, to switch. 


Going back to the original 3-door problem, the non-switcher has a probability of win- 
ning equal to 1/3 and the switcher 2/3. O 


Generating Objects at Random 


To test complicated algorithms, we may want to run the algorithm on a lot of random 
problems. Even if we know the algorithm works, we may want to do this to study the speed 
of the algorithm. Computer languages include routines for generating random numbers. 
What can we do if we want something more complicated? 


In Section 4 of Unit Fn, we gave an algorithm for generating random permutations. 
Here we show how to generate random objects using a decision tree. 
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Let (U, P) be a probability space. Suppose we want to choose elements of U at random 
according to the probability function P. In other words, u € U will have a probability P(w) 
of being chosen each time we choose an element. This is easy to do if we have a decision tree 
whose leaves correspond to the elements of U. The process is best understood by looking 
at an example 


Example 21 (Generating random words) In Example 2 we looked at the problem of 
counting certain types of “words.” Go back and review that example before continuing. 


We want to generate those words at random. 


We’ll use a two step approach. First, we’ll select a CV-pattern corresponding to one 
of the leaves in the tree from Example 2. (We’ve reproduced the figure below. For the 
present, ignore the numbers in the figure.) Second, we’ll generate a word at random that 
fits the pattern. 


1.00 
ee 
C 0.73 Vv 
cc CV i. 0.27 
a a 0.24 VCC VCV 
a 0.49 a oe me ee 0.15 
CCVCC CCVCV CVCCV bie ae VCVCC VCVCV 
0.3710 0.1172 0.1172 0.1233 0.1172 0.1172 0.0370 


Generating a random word to fit the pattern is simple. To illustrate, suppose the 
pattern is CCVCV. Since there are 20 choices for the first C, use the computer software to 
generate a random number between 1 and 20 to decide what consonant to choose for C. 
The second C has 19 choices since adjacent consonants must be different and so on. Here’s 
the result of some random choices 


position number random letter 
& type ofchoices number chosen comments 


1 C 20 5 G 5th among consonants (BCDFG. ..) 
2 C 19 11 P 11th among consonants except G 

3 .V 6 2 E 2nd among vowels (AEIOUY) 

4 C 20 11 N 11th among consonants (BCDFG. ..) 
5 V 6 3 I 3rd among vowels (AEIOUY) 


How should we choose a pattern? We discovered in Example 2 that some patterns fit 
more words than other patterns fit. Each pattern should be chosen in proportion to the 
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number of words it fits so that each word will have an equal chance. Using the counts in 
Example 2, we computed the probabilities of the leaves in the preceding figure. Thus 


number of words with leaf pattern 

P(leaf) = AANNNNMNMM 
total number of words 

‘You should compute those values yourself. We have constructed a probability space where 
U is the set of leaves in the tree and P has the values shown at the leaves. Each vertex in 
the tree corresponds to an event, namely the set of leaves that are below it in the tree. Thus 
we can compute the probability of each vertex in the tree by adding up the probabilities of 
the leaves below that vertex. Many of those probabilities are shown in the previous figure. 


How do we generate a leaf at random using the probabilities we’ve computed? We start 
at the root of the tree and choose a path randomly as follows. If we are at a vertex v that has 
edges (v, w), (v,x) and (v, y), we simply choose among w, x and y by using the conditional 
probabilities P(w|v) = P(w)/P(v), P(alv) = P(x)/P(v) and P(y|v) = P(y)/P(v). In 
other words, choose w with probability P(w|v) and so on. (This can be done using random 
number generators on computers. ) 


Someone might say: 


All that work with the tree is not necessary since we can use the following 
“direct” method: Using the leaf probabilities, generate a random pattern. 


That approach is not always feasible. For example, suppose we wanted to generate a 
random strictly decreasing function from 200 to 100. We learned in Section 3 of Unit Fn 
that there are ea of these functions. This number is about 3 x 10°8. Many random 
number generators cannot reliably generate random integers between 1 and a number this 
large. Thus we need a different method. One way is to use a decision tree that lists the 
functions. It’s a much bigger tree than we’ve looked at, but we don’t need to construct the 
tree. All we need to know is how to compute the conditional probabilities so that each leaf 
will have probability 1/ Gar It turns out that this can be done rather easily. In summary, 
the tree method can be used when the “direct” method is not practical. J 


*The First Moment Method and the SAT Problem 


We now review briefly the concept of conjunctive normal form. Suppose p,q,7r,... are 
Boolean variables (that is, variables that can be 0 or 1). The operations ~, V, and 
A stand for “negation”, “or”, and “and”, respectively. A disjunctive clause is a list of 
Boolean variables and negations of Boolean variables joined by V. Here are four examples 


of disjunctive clauses: 


qVr,  pV(~q)Vr, (~p)V(~gV(~r), (~r)Vq. 


Conjunctive normal form is a statement form consisting of disjunctive clauses joined by A; 
for example 


(QVrADV(~gVernA(raVKgvrA(~r) vq. 
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(Disjunctive normal form is the same as conjunctive normal form except that A and V are 
switched.) The satisfiability problem is the following. Given a statement in conjunc- 
tive normal form, is there some choice of values for the Boolean variables that make the 
statement equal to 1? One may also want to know what choice of variables does this. The 
satisfiability problem is also called the SAT problem. The SAT problem is known to be 
hard in general. (The technical term is “NP-complete” .) 


Given a statement in conjunctive normal form, how might we try to solve the satis- 
fiability problem? One way is with the following backtracking algorithm for a statement 
involving p1,p2,---,Pn 


Step 1. Set k=1. 
Step 2. Set py, = 0. 


Step 3. (Test) Check to see if any of the clauses that contain only p1,...,p% are 0. If so, 
go to Step 40; if not, go to Step 4;. 


Step 49. (Failure) If p, = 0, set p, = 1 and go to Step 3. If k = 1, stop (no solution). If 
pr = 1, replace k with k — 1 and go to Step 49. 


Step 4;. (Partial success) If k = n, stop because the current values of the variables make 
the statement 1. If k < n, replace k with k+ 1 and go to Step 2. 


You should use the algorithm on the conjunctive normal form statement given earlier. 


The following example shows that we can sometimes guarantee that the algorithm 
will succeed if there are not too many clauses. However, it does not give us values of the 
variables that will make the statement 1. In the example after that, we will see how to use 
the idea from the next example to find those values without backtracking. 


Example 22 (SAT with just a few clauses) Suppose we have a conjunctive normal 
form statement S = C) A C2 A---A Cy, where the C; are clauses in the Boolean variables 
Pi,-++;Pn- Make the set x"{0,1}, the possible values for p1,...,Pn, into a probability 
space by letting each n-tuple have probability 1/2”. 


Let X; be a random variable whose value is 1 if C; has the value 0, and let X; = 0 if 
C;, has the value 1. (Be careful: note the reversal — X; is the opposite of C;.) The number 
of clauses which are 0 is X = X,+---+ Xx. If we can show that P(X = 0) > 0, we will 
have shown that there is some choice of p,,...,p,, for which all clauses are 1 and so S$ will 
be 1 as well. How can we do this? Here is one tool: 


Theorem 4 (First Moment Method) Suppose that X is an integer-valued random 
variable and E(X) <m-+1, then P(X < m) is greater than 0. 


This is easy to prove: 


m+1>E(X)=S kP(X=k)> YO (mt 1)P(X =k) = (m4 1) P(X >m4)). 
k k>m+1 


Thus P(X >m+1)<1andso P(X <m)=1-P(X >m+1)>0. 
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To apply this, we need to compute E(X) = E(X,)+---+ E(X;,). Let uv; be the number 
of variables and their negations appearing in C;. We claim that E(X;) = 27”. Why is 
this? Note that E(X;) equals the probability that C; has the value 0. To make C; have 
the value 0, each variable in C; must be chosen correctly: 0 if it appears without being 
negated and 1 if it appears negated. The variables not appearing in C; can have any values 
whatsoever. 


We have shown that E(X) = 27"! 4+---+27%*. By the First Moment Method, we are 
done if this is less than 1. We have proved: 


Theorem 5 (SAT for few clauses) Suppose we have a conjunctive normal form 
statement S = Cy, A Co A--: A Cy, where the C; are clauses in the Boolean variables 
Pi,-++;Pn- Let v; be the number of variables (and negations of variables) that appear in 
Cy. If 27°! +++. +27%* < 1, then there is a choice of values for pj,...,Pn which gives S 
the value 1. 


Let’s apply the theorem to 


S=(Vr)ADV(~Q)V(er)AC(eDV(SQVrAC(~ rT) VQ). 


We have v1 = 2, vo = 3, v3 = 3, and v4 = 2. Thus 


E(X) =2°7 4273 42-3427 =3/4<1. 


Thus there is a choice of variables that give S the value 1. If you carried out the backtracking 
algorithm as you were asked to earlier, you found such an assignment. Of course, you may 
find the assignment rather easily without backtracking. However, the theorem tells us a lot 
more: It doesn’t look at the structure of the clauses, so you could change p to ~ p and so 
on in any of the clauses you wish and the statement would still be satisfiable. OJ 


Example 23 (Satisfiability without backtracking) Suppose the situation in the pre- 
ceding example holds; that is, E(X) < 1. We want to find values for p;,..., py that satisfy 
S (give it the value 1). We have 


E(X) = P(pn = 0) E(X | Pn = 9) + PlPn = 1) E(X | Pn = 1) 
= 5E(X | Pn =0) + 5E(X | pn = 1). 
Since E(X) < 1 at least one of E(X | py, = 0) and E(X | p, = 1) must be less than 1. 


Suppose that E(X | py, = 0) < 1. Set p, = 0 and simplify S to get a new statement S’ in 
P1;+++;Pn—1- To get this new statement S’ from S when p,, = 0: 


e any clause not containing p, or ~ py is unchanged; 


e any clause containing ~ p,, will have the value 1 regardless of the remaining variables 
and so is dropped; 


e any clause containing p, depends on the remaining variables for its value and so is 
kept, with p, removed. 
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When p,, = 1, the last two cases are reversed to produce S$”. This method will be illustrated 
soon. 


Let X’ be for S’ what X is for S. You should show that 
E(X’) = E(X | pp =0) <1. 


We can now repeat the above procedure for S’, which will give us a value for p,—,. Con- 
tinuing in this way, we find values for pp, Pn—1,---,P1- 


Let’s apply this to 
S=(GVr)ADV(~ QV (er) A(r IVY g)vVr)A(~r) Vg). 
When p = 0, this reduces to 
(avr) A(ragV(~r))A(~r) V9), 


and so E(X |p=0) =2-24+2-?42-2 <1. Thus we can take the previous statement to 
be S’. Suppose we try g = 0. Then 


CVC G) iy Leven) IN (Caer) Wg) 


reduces to r\(~ r) because the middle clause disappears. The expectation is 2~'+271 = 1, 
so this is a bad choice. (Of course this is obviously a bad choice, but we’re applying the 
method blindly like a computer program would.) Thus we must choose gq = 1. The 
statement reduces to ~ r, and we chooser =0. OJ 


Exercises for Section 3 


3.1. A box contains 3 white and 4 green balls. 


(a) Two balls are sampled with replacement, what is the probability that the 
second is white if the first is green? If the first is white? 


(b) Two balls are sampled without replacement, what is the probability that the 
second is white if the first is green? If the first is white? 


3.2. Two dice are rolled and the total is six. 
(a) What is the probability that at least one die is three? 
(b) What is the probability that at least one die is four? 
(c) What is the probability that at least one die is odd? 


3.3. In a certain college, 10 percent of the students are physical science majors, 40 
percent are engineering majors, 20 percent are biology majors and 30 percent are 
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humanities majors. Of the physical science majors, 10 percent have read Hamlet, 
of the engineering majors, 50 percent have read Hamlet, of the biology majors, 
30 percent have read Hamlet, and of the humanities majors, 20 percent have read 
Hamlet. 


(a) Given that a student selected at random has read Hamlet, what is the proba- 
bility that that student is a humanities major? 


(b) Given that a student selected at random has not read Hamlet, what is the 
probability that that student is an engineering or physical science major? 


We are given an urn that has one red ball and one white ball. A fair die is thrown. 
If the number is a 1 or 2, one red ball is added to the urn. Otherwise three red 
balls are added to the urn. A ball is then drawn at random from the urn. 


(a) Given that a red ball was drawn, what is the probability that a 1 or 2 appeared 
when the die was thrown? 


(b) Given that the final composition of the urn contained more than one red ball, 
what is the probability that a 1 or 2 appeared when the die was thrown? 


A man starts with one dollar in a pot. A “play” consists of flipping a fair coin and, 
if heads occurs, a dollar is added to the pot, if tails occurs, a dollar is removed from 
the pot. The game ends if the man has zero dollars or if he has played four times. 
Let X denote the random variable which, for each outcome of the game, specifies 
the maximum amount of money that was ever in the pot, from (and including) the 
start of the game to (and including) that final outcome. What is the expected value 
E(X)? 


The probability of team A winning any game is 1/3, of B winning 2/3 (no ties in 
game play). Team A plays team B in a tournament. If either team wins two games 
in a row, that team is declared the winner. At most four games are played in the 
tournament and, if no team has won the tournament at the end of four games, 
the tournament is declared a draw. What is the expected number of games in the 
tournament? 


The platoon commander knows: 


e If the air strike is successful, there is a 60% probability that the ground forces 
will not encounter enemy fire. 


e If the air strike is not successful, there is a 80% probability that the ground 
forces will encounter enemy fire. 


e There is a 70% probability that the air strike will be successful. 


Answer the following questions. 
(a) What is the probability that the ground forces will not encounter enemy fire? 


(b) The ground forces did not encounter enemy fire. What is the probability that 
the air strike was successful? 
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Section 4: Inductive Proofs and Recursive Equations 


Proof by induction, familiar from prior courses and used occasionally in earlier sections, 
is central to the study of recursive equations. We’ll begin by reviewing proof by induction. 
Then we’ll look at recursions (another name for recursive equations). The two subjects 
are related since induction proofs use smaller cases to prove larger cases and recursions 
use previous values in a sequence to compute later values. A “solution” to a recursion is 
a formula that tells us how to compute any term in the sequence without first computing 
the previous terms. We will find that it is usually easy to verify a solution to a recursion 
if someone gives it to us; however, it can be quite difficult to find the solution on our own 
— in fact there may not even be a simple solution even when the recursion looks simple. 


Induction 


Suppose A(n) is an assertion that depends on n. We use induction to prove that A(n) is 
true when we show that 


e it’s true for the smallest value of n and 
e if it’s true for everything less than n, then it’s true for n. 


Closely related to proof by induction is the notion of a recursion. A recursion describes how 
to calculate a value from previously calculated values. For example, n! can be calculated 
by using n! = 1 ifn =0, n!=n(n—-1)! ifn >0. 


Notice the similarity between the two ideas: There is something to get us started 
and then each new thing depends on similar previous things. Because of this similarity, 
recursions often appear in inductively proved theorems. We’ll study inductive proofs and 
recursive equations in this section. 


Inductive proofs and recursive equations are special cases of the general concept of a 
recursive approach to a problem. Thinking recursively is often fairly easy when one has 
mastered it. Unfortunately, people are sometimes defeated before reaching this level. In 
Section 3 we look at some concepts related to recursive thinking and recursive algorithms. 


We recall the theorem on induction and some related definitions: 


Theorem 6 (Induction) — Let A(m) be an assertion, the nature of which is dependent 
on the integer m. Suppose that ng < n,. If we have proved the two statements 


(a) “A(n) is true for ng <n <n,” and 
(b) “Ifn > ny, and A(k) is true for all k such that no < k <n, then A(n) is true.” 


Then A(m) is true for all m > no. 
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Let’s look at a common special case: no = n, and, in proving (b) we use onlyA(n — 1). 
Then the theorem becomes 


Let A(m) be an assertion, the nature of which is dependent on the integer m. If 
we have proved the two statements 


(a) “A(no) is true” and 
(b) “Ifn > no and A(n —1) is true, then A(n) is true.” 
Then A(m) is true for all m > no. 


Some people use terms like “weak induction”, “simple induction” and “strong induction” 
to distinguish the various types of induction. 


Definition 5 (Induction hypothesis) |The statement “A(k) is true for all k such that 
no <k <n” is called the induction assumption or induction hypothesis and proving that 
this implies A(n) is called the inductive step. A(no),...,A(m1) are called the base cases or 
simplest cases. 


Proof: We now prove the theorem. Suppose that A(n) is false for some n > no. Let 
m be the least such n. We cannot have m < ny; because (a) says that A(n) is true for 
no <n<n,. Thus m> nj. 


Since m is as small as possible, A(k) is true for no < k < m. By (b), the inductive 
step, A(m) is also true. This contradicts our assumption that A(n) is false for some n > no. 
Hence the assumption is false; in other words, A(n) is never false for n > no. This completes 
the proof. J 


Example 24 (Every integer is a product of primes) A positive integer n > 1 is 
called a prime if its only divisors are 1 and n. The first few primes are 2, 3, 5, 7, 11, 18, 
17, 19, 23. Ifa number is not a prime, such as 12, it can be written as a product of primes 
(prime factorization: 12 = 2 x 2 x 3). We adopt the terminology that a single prime p is 
a product of one prime, itself. We shall prove A(n) that “every integer n > 2 is a product 
of primes.” Our proof will be by induction. We start with np = n, = 2, which is a prime 
and hence a product of primes. The induction hypothesis is the following: 


“Suppose that for some n > 2, A(k) is true for all k such that 2<k <n.” 


Assume the induction hypothesis and consider n. If n is a prime, then it is a product of 
primes (itself). Otherwise, n = st where 1 < s < n and 1<t<_n. By the induction 
hypothesis, s and t are each a product of primes, hence n = st is a product of primes. [JJ 


In the example just given, we needed the induction hypothesis “for all k such that 
2<k<n.” In the next example we have the more common situation where we only need 
to assume “for k = n— 1.” We can still make the stronger assumption and the proof is 
valid, but the stronger assumption is not used; in fact, we are using the simpler form of 
induction described after the theorem. 
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Example 25 (Sum of first n integers) We would like a formula for the sum of the 
first n integers. Let us write S(n) = 1+2+...+ 2 for the value of the sum. By a little 
calculation, 


SH)=1, SQ) =3, 63) =6,--S4) = 10), 8(5) 1b SG). = 21. 
What is the general pattern? It turns out that S(n) = aint) is correct for 1 <n < 6. Is 
it true in general? This is a perfect candidate for an induction proof with 


no =n = 1 and A(n) 2 “S(n) = ninth) ” 


Let’s prove it. We have shown that A(1) is true. In this case we need only the restricted 
induction hypothesis; that is, we will prove the formula for S(n) by assuming the formula 
for k =n —1. Thus, we assume only A(n — 1) is true. Here it is (the inductive step): 


S(n) =1+2+---+n by the definition of S(n) 
=(14+24+-4(n=1)) +n 
= S(n-—1)+n by definition of S(n — 1), 
—1 —1 1 
2 eS) Me a lie by A(n —1), 
1 
= nny) by algebra. 


This completes the proof. We call your attention to the fact that, in the third line we 
proved S(n) = S(n—-—1) +n. O 


Recursive Equations 


The equation S(n) = S(n—1)+n (for n > 1) that arose in the inductive proof in the 
preceding example is called a recurrence relation, recursion, or recursive equation. A 
recursion is not complete unless there is information on how to get started. In this case the 
information was $(1) = 1. This information is called the initial condition or, if there is more 
than one, initial conditions. Many examples of such recurrence relations occur in computer 
science and mathematics. We discussed recurrence relations in Section 3 of Unit CL (Basic 
Counting and Listing) for binomial coefficients C(n,k) and Stirling numbers S(n, k). 


In the preceding example, we found that S(n) = n(n +1)/2. This is a solution to the 
recursion because it tells us how to compute S(n) without having to compute S(k) for any 
other values of k. If we had used the recursion $(n) = S(n—1)+ 7, we would have had to 
compute S(n — 1), which requires S(n — 2), and so on all the way back to S(1). 


A recursion tells us how to compute values in a sequence a, from earlier values 
An—1;4n—2,--. and n. We can denote this symbolically by writing a, = G(n,a@n_1, Qn—2,---)- 
For example, in the case of the sum of the first n integers, which we called $(n), we would 
have 

Qn, = S(n) and G =a,-1+7 since S(n) = S(n-—1)+n. 
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Induction proofs deduce the truth of A(n) from earlier statements. Thus it’s natural 
to use induction to prove that a formula for the solution to a recursion is correct. That’s 
what we did in the previous example. There’s a way to avoid giving an inductive proof 
each time we have such a problem: It turns out that the induction proofs for solutions to 
recursions all have the same form. A general pattern often means there’s a general theorem. 
If we can find and prove the theorem, then we could use it to avoid giving an inductive 
proof in each special case. That’s what the following theorem is about. (The a, and f(n) 
of the theorem are generalizations of S$, and aintt) from the previous example.) 


Theorem 7 (Verifying the solution of arecursion) Suppose we have initial conditions 
that give a, for no <n <n, and a recursion that allows us to compute a, when n > ny. 
To verify that a, = f(n), it suffices to do two things: 


Step 1. Verify that f satisfies the initial conditions. 
Step 2. Verify that f satisfies the recursion. 


Proof: The goal of this theorem is to take care of the inductive part of proving that a 
formula is the solution to a recursion. Thus we will have to prove it by induction. We must 
verify (a) and (b) in Theorem 6. Let A(n) be the assertion “a, = f(n).” By Step 1, A(n) is 
true for no <n < n1, which proves (a). Suppose the recursion is a, = G(n,@n_1,Gn_2,---) 
for some formula G. We have 


f(n) = G(n, (n= 1), f(n—2),.-.) by Step 2, 
= Gt, dpetsGigsossa+) by A(k) fork <n, 
= An by the recursion for an. 


This proves (b) and so completes the proof. QO 


Example 26 (Proving a formula for the solution of a recursion) Let S(n) be 
the sum of the first n integers. The initial condition S(1) = 1 and the recursion S(n) = 
n+ S(n— 1) allow us to compute S(n) for all n > 1. It is claimed that f(n) = ninti) 
equals S(n). 


The initial condition is for n = 1. Thus nop = ny = 1. Since f(1) = 1, f satisfies the 
initial condition. (This is Step 1.) For n > 1 we have 


nt pin—1yan¢ MALY mtd) _ Fm) 


and so f satisfies the recursion. (This is Step 2.) 


We now consider a different problem. Suppose we are given that 
a9 =2, ay=7, and ay = 3an_1 — 2an,_2 when n > 1 
and we are asked to prove that a, = 5 x 2” — 3 for n> 0. 
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Let’s verify that the formula is correct for n = 0 and n = 1 (the initial conditions — 
Step 1 in our theorem): 


n=0: a =2=5x2°-3 n=1: a =7=5x2!'-3. 


Now for Step 2, the recursion. Let f(x) = 5 x 2” —3 and assume that n > 1. We have 


afi 1 Sofas 2) 3b 8) S38) 9 Ke S38) 
=(8x5x2-2x5)2"-?-3 
=h X42" = 3 = fF i): 


This completes the proof. 


As a final example, suppose bp = 61 = 1 and bn41 = (bp + bn—1) for n > 1. We 
want to prove that b, = n!. Since our theorem stated the recursion for a, let’s rewrite our 
recursion to avoid confusion. Let n+1 = k in the recursion to get by = (k—1)(bp—-1+b,—2). 
The initial conditions are bb) = 1 = 0! and b; = 1 = 1!, so we’ve done Step 1. Now for 
Step 2: 

Is k! = (kK-—1)((K-1)!4+(k-2)!) true? 
Yes because (k—1)! = (k—1) x (k—2)! and so (k—1)!+(k—2)! = ((kK-1)+1)(k-2)! = 


k x (k — 2)!. We could have done this without changing the subscripts in the recursion: 
Just check that (n +1)! = n(n!+(n—1)!). We'll let you do that. O 


So far we have a method for checking the solution to a recursion, which we just used 
in the previous example. How can we find a solution in the first place? If we’re lucky, 
someone will tell us. If we’re unlucky, we need a clever guess or some tools. Let’s look at 
how we might guess. 


Example 27 (Guessing solutions to recurrence relations) 


(1) Let ry = —rp_i/k for k > 1, with ro = 1. Writing out the first few terms gives 
1,-1,1/2,—-1/6,1/24,.... Guessing, it looks like r;, = (—1)*/K! is a solution. 


(2) Let t, = 2t,-1 +1 for k > 0, t) = 0. Writing out some terms gives 0,1,3,7,15,.... It 
looks like t, = 2* — 1, for k > 0. 

(3) What is the solution to a9 = 0, a, = 1 anda, = 4an,_1 — 4an_2 for n > 2? Let’s 
compute some values 


3 4 5 6 7 
2 32 80 192 448 


These numbers factor nicely: 4 = 27, 12 = 2? x 3, 32 = 2°, 80 = 24 x 5, 192 = 2° x 3, 
448 = 2° x 7. Can we see a pattern here? We can pull out a factor of 2”~! from an: 


n: 012 3 4 +5 6 7 
Qn: 0 1 4 12 32 80 192 448 
ie DOSS 10: ol DB oe VG 6 7 
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Now the pattern is clear: a, = n2"~!. That was a lot of work, but we’re not done 
yet — this is just a guess. We have to prove it. You can use the theorem to do that. 
We'll do it a different way in a little while. 


(4) Let b, = bybyp_1 + babn_2 +--+ + bn_1b1 for n > 2, with b; = 1. Here are the first few 
terms: 1,1,2,5,14,42, 132,429, 1430, 4862,.... Each term is around 8 or 4 times the 
preceding one. Let’s compute the ratio exactly 


a ae: re ae; an: a 
5/2 14/5 3 22/7 13/4 10/3 17/5 


These ratios have surprisingly small numerators and denominators. Can we find a 
pattern? The large primes 13 and 17 in the numerators for n = 8 and 10 suggest that 
maybe we should look for 2n—3 in the numerator.' Let’s adjust our ratios accordingly: 


n 2 Bo Sd OB 6. < FB Oe 40 
bn /(2n—3)bn-1 1 2/3 1/2 2/5 1/3 2/7 1/4 2/9 1/5 


Aha! These numbers are just 2/n. Our table leads us to guess b,, = 2(2n — 3)bn_i/n, 
a much simpler recursion than the one we started with. 


This recursion is so simple we can “unroll” it: 
p 


2(2n—3), — _ 2(2n—3) 22n—5) , _ 2-1(2n — 3)(2n—5)---1 
n a n a a nl ; 


b= 


This is a fairly simple formula. Of course, it is still only a conjecture and it is not easy 
to prove that it is the solution to the original recursion because the computations in 
Theorem 7 using this formula and the recursion by), = bybn—1 + bebn_2 + +++ + bn_1b1 
would be very messy. 


(5) Let dy = (n—1)dn_1+(n—1)dn_2 for n > 2, with do = 1 and d; = 0. In the previous 
example, we looked at a recursion that was almost like this: The only difference was 
that d; = 1. In that case we were told that the answer was n!, so maybe these numbers 
look like n!. If this were like n!, we’d expect nd,_, to equal d,,. Here are the first few 
values of d, together with ndy,_}: 


nmnQ0O1234 5 6 
d, 10 1 2 9 44 265 
ndyjn-1 — 10 3 8 45 264 


We’re close! The values of d, and ndn_ 1 only differ by 1. Thus we are led to guess that 
dn = ndn—1+(—1)”. This is not a solution—it’s another recursion. Nevertheless, we 
might prefer it because it’s a bit simpler than the one we started with. O 


As you can see from the previous example, guessing solutions to recursions can be 
difficult. Now we’ll look at a couple of theorems that tell us the solutions without any 
guessing. 


! “Why look at large primes?” you ask. Because they are less likely to have come from 
a larger number that has lost a factor due to reduction of the fraction. 
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Theorem 8 (Solutions to Some Recursions) Let ao,a1,...,@n,... be a sequence of 
numbers. Suppose there are constants b and c such that b is not 0 or 1 and ay, = ban_1 +c 
forn > 1. Then 


Cc 
GQ, = Ab” + kK where K= 75 and A= a)—K = a-7_>- 


This gives us the solution to the recursion t, = 2t,_; +1 (with tp = 0) of the previous 


example: K = 4 = -—1 and A =0- (-1) =1. That gives the solution t, = 2* — 1, no 


guessing needed! 
Proof: (of Theorem 8) We’ll use Theorem 7. The initial condition is simple: 
AbDo+K = A+K = (a9—K)+K = ag. 


That’s Step 1: For Step 2 we want to show that a, = Ab"+4K satisfies the recursion. We 
have 


b 
b(AB"-3 4K) +e = AM +bK +e = A+ —— +e 


1—b 
bos Be Ga c 
Nee oe eae 


Ab” 4 = Ab" +K. 


We’re done. OJ 


Example 28 (I forgot the formulas for A and K!) If you remember the b”, you can 
still solve the recursion even if the initial condition is not at ag. Let’s do the example 


aj=3 and Qn = 44n-1-7 for n>1. 


The solution will be a, = A4" + K for some A and K. If we know a, for two values of n, 
then we can solve for A and K. We’re given a; = 3 and we compute ag = 4x3-—7 = 5. 
Thus 

a, givesus 3 = A4’+K and dj givesus 5 = A4?4+K. 


Subtracting the first equation from the second: 12A = 2 so A= 1/6. From aj, 3 = 4/6+- K 
and so K =7/3. O 


Now let’s look at recursions where a, depends on ayj—1 and @pj_2 in a simple way. 


Theorem 9 (Solutions to Some Recursions) Let ao,a1,...,@n,... be a sequence of 
numbers. Suppose there are constants b and c such that ay, = ban—1 + Can—2 for n > 2. 
Let r; and rg be the roots of the polynomial x? — bx — c. 


e [fry Are, then an = Kyr} + Kor} for n > 0, where Ky and Ko are solutions to the 
equations 
K,+ Ko =a0 and 1K, +rekKe= ay. 
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e Ifr, =r, then a, = Kyr} + Konr} for n > 0, where Ky, and Kz are solutions to the 
equations 
Ky, = 49 and 11K, + 7reKo = 171K, +71Ke = a4. 


The equation x? — br — c = 0 is called the characteristic equation of the recursion. 


Before proving the theorem, we give some examples. In all cases, the roots of x? —ba—c 
can be found either by factoring it or by using the quadratic formula 


b+ Vb? + 4c 


2 


Pi 2 = 


Example 29 (Applying Theorem 9) Let’s redo the recursion 
a9 =2, ay=7, and ay = 3an_1 — 2an_2 when n> 1 


from Example 26. We have b = 3 and c = —2. The characteristic equation is r?7—3x2+2 = 0. 
The roots of x? — 3a +2 are rj = 2 and rz = 1, which you can get by using the quadratic 
formula or by factoring x? — 3x + 2 into (a — 2)(a—1). Since r1 4 rz, we are in the first 
case in the theorem. Thus we have to solve 


Ki, 4+ Ko =2 and 2K, 4+ Ko = 7. 


The solution is kK; = 5 and Ky = —3 and so a, = 5 X 2” —3x 1% =5 x 2” — 3. 


e As another example, we’ll solve the recursion ag = 0, aj = 1, and ay, = 4an_1 — 4an_2 
for n > 2. Applying the theorem, r; = rg = 2 and so a, = K 12” + Kon2” where K; = 0 
and 2K, + 2K, =1. Thus kK; =0, Ky =1/2, and a, = (1/2)n2” = n2""!. 


e Asa final example, consider the recursion 
Fo=F,=1 and Fy = Fr_1 + Fr_a when k > 2. 


This is called the Fibonacci recursion. We want to find an explicit formula for Fy. 


2 


The characteristic equation is x* — x —1=0. By the quadratic formula, its roots are 


— 1+v5 1-V5 


5 3. Thus, we need to solve the equations 


and rg = 


ry 
K,+ Ko nil and ry Ky +17roKo an 


High school math gives 
ky a — ee 


Ky = —— = = -—., 


nut fi+vs ee fies 


It would be difficult to guess this solution from a few values of F,! O 


Thus 
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Example 30 (A shifted index) Let’s solve the recursion 
a, = 0, a2 =1 and Qn = 5An-11+6an-2 for n>3. 


This doesn’t quite fit the theorem since it starts with a, instead of a9. What can we do? 
The same thing we did in Example 28: Use values of a, to get two equations in the two 
unknowns Ky, and Ko. 


Let’s do this. The characteristic equation x? —5x—6 = 0 gives us ry = 6 and rg = —1 
and so an = K,6" + Ko(—1)". Using ai and ag: 


a, givesus 0 = 6K, — Ko and a2 givesus 1 = 6 KG Ko. 


Adding the two equations: 1 = 42K,. Thus Kk; = 1/42. a, gives us0 = 6/42 — K2 and 
so Ky = 1/7. Thus a, = (1/42)6"+ (1/7)(-1)”. O 


We conclude this section with a proof of Theorem 9. 


Proof: (of Theorem 9) We apply Theorem 7 with no = 0 and n; = 1. 


We first assume that ry # rp and we set f(n) = Kyr}? + Kor} where K, and Ko are 
as given by the theorem. Step 1 is simple because the equations for K, and Ko are simply 
the equations f(0) = ao and f(1) = a,. Here’s Step 2 


bf(n — 1) +ef(n— 2) = b(Kart* + Kary!) + e(Kirt* + Kary”) 
= Kyrt? (br, +c) + Kort (bre +) 
= Kyrt?r? + Kort ?r2 


= f(n). 


Wait! Something must be wrong — the theorem says rj # rp and we never use that 
fact! What happened? We assumed that the equations could be solved for K, and Ko. 
How do we know that they have a solution? One way is to actually solve them using high 
school algebra. We find that 


agr2g — ay aor, — ay 


ky = and Ky= 


T2—T1 Ty — 12 
Now we can see where r, # rg is needed: The denominators in these formulas must be 
nonzero. 


We now consider the case rj = rp. This is similar to r; # re. We sketch the ideas 
and leave it to you to fill in the details of the proof. Here it’s clear that we can solve the 
equations for K, and K2. Step 1 in Theorem 7 is checked as it was for the r; # r2 case. 
Step 2 requires algebra similar to that needed for r; # rg. The only difference is that we 
end up needing to show that 


Kor8-?((n — loro + (n — 2)c) = Konr?. 


You should be able to see that this is the same as showing —br2 — 2c = 0. This follows 
from the fact that the only way we can have r; = rg is to have Vb? + 4c = 0. In this case 
fo = b/2. oO 
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Exercises for Section 4 


4.1. 


4.2. 


4.3. 


4.4. 


4.5. 


4.6. 


4.7. 


Compute ao, a1, a3 and a4 for the following recursions. (Recall that |x| is the 
greatest integer not exceeding x. For example |5.4| = 5 and |—5.4| = —6.) 


(a) ay = 1 ag =] 30427 — 2 for n> 1. 


(b) ag =03-a,, ="|n/2| + dy-1 for n> 0, 


) 
(c) a9 =1, an =N + Q)n/2) for n > 0. 
) 


(d) e9 =O0paj-= 1, Gye 2 min (ag 245.55 Gp dae ke es9 a1 ) for n> 1. 


We computed the first few values of some sequences that were defined by recursions. 
A table of values is given below. Guess simple formulas for each sequence. 


n: O 1 2 8 4 5 
An 0 oO 1 1 2 2 
bn 1 -1l 2 -2 3 -8 
Cn: 1 2 5 10 17 26 
dn: 1 1 2 6 24 120 
What are the characteristic equations for the recursions a, = 6an_1 — 5an_2, 


An = An—1 + 2an_2 and An = 5(An—1 + Gn—2)? 
What are the roots of these equations? 


Solve the recursion ag = 0, a1 = 3 and ay = 6an_1 — 9a4n_2 for n > 2. 
Solve the recursion az = 1, a3 = 3 and ay = 3an_1 — 24n_2 for n > 3. 
Solve the recursion az, = 2ap_1 — Gp_2, k > 2, ap = 2, ay = 1. 


Suppose A # 1. Let G(n) =1+A+A24+...+ A"! forn>1. 


(a) Using induction, prove that G(n) = (1— A")/(1—A) forn>1. (This is the 
formula for the sum of a geometric series.) 


(b) Obtain a simple recursion for G(n) from G(n) = 1+ A+ A?24...+ A", 
including initial conditions. 


(c) Use the recursion in (b) and Theorem 7 to prove that G(n) = (1— A”)/(1— A) 
for n> 1. 


(d) By setting A = y/x and doing some algebra, prove that 


ghtl _ yktl 


k 


= chy 4a tyl +... + 0°y* when x F y. 


t—Y 
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4.8. 


4.9. 


4.10. 


4.11. 


In each of the following, find an explicit formula for a, that satisfies the given 
recursion. Prove your formula. 


(a) ay = Gp_-1/(1+ ax_1) fork >1,a9 = A>O. 
(b) ag = Aag_1 + B, Kk > 1, a9 =C. 


Consider ay, = ax_1 + Bk(k—1), k > 1, a9 = A. Prove that a, = A+ Bk(k? —1)/3, 
k > 0, is the solution to this recursion. 


Consider a, = A2" — ap_1, k > 1, ap = C. 


(a) Prove that 
Gp SA OPER 1 or a) ee er) aa; BS, 


is the solution to this recursion. 


(b) Write the formula for a, more compactly using Exercise 4.7. 


A gambler has t > 0 dollars to start with. He bets one dollar each time a fair 
coin is tossed. If he wins Q, Q > t, dollars, he quits, a happy man. If he loses all 
his money he quits, a sad man. What is the probability q that he wins Q dollars 
instead of losing all his money and quits a happy man? What is the probability 
pz that he loses all his money and quits a sad man (i.e., ruined)? This problem is 
called the Gambler’s Ruin problem. 
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Multiple Choice Questions for Review 


1. In each case, two permutations on 6 are listed. In which case is the first permutation 
less than the second in direct insertion order? 


(a) 2,3,1,4,5,6 1,3,2,4,5,6 
(b) 2,3,1,4,5,6 2,1,3,4,5,6 
(ec). 2,3,1,4,5,6 4,5,6,1,3,2 
(d) 6,1,2,3,4,5 2,1,3,4,5,6 
(e) 6,2,3,1,4,5 2,3,1,4,5,6 
2. What is the rank, in direct insertion order, of the permutation 5, 4,6,3,2,1? 
(a) 3 (b) 4 (c) 715 (d) 716 (e) 717 
3. What is the rank, in lex order, of the permutation 6,1, 2,3, 4,5? 
(a) 20 (b) 30 (c) 480 (d) 600 (e) 619 


4. Consider the list of all sequences of length six of A’s and B’s that satisfy the following 
conditions: 


(i) There are no two adjacent A’s. 
(ii) There are never three B’s adjacent. 
What is the next sequence after ABBABB in lex order? 
(a) ABABAB 
(b) ABBABA 
(c) BABABA 
(d) BABBAB 
(c) BBABBA 
5. Which of the following 4 x 4 domino covers represent two distinct hibachi grills? 
(a) hhhhhvvh and hyvhhhhh 
(b) hvvhvvhh and vvhhvvhh 
(c) vhvvvhvh and hhvhvhvv 
(d) vvhhvvhh and hhhhvvvv 
(e) vvvvvvvv and hhhhhhhh 
6. Given that a9 = 1, an = n+ (—1)"an_1 for n > 2 What is the value of a4? 
(a) 1 (b) 4 (c) 5 (d) 8 (e) 11 


7. Given that a, = ag_1/(1 + ax_1) for k > 1, ag = 1. Which of the following gives an 
explicit formula for a,? 


Gye. FSS ce. 
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10. 


11. 


(b) 1/2", k =0,1,2,3,. 

(c) 1/(3*+! — 2), k =0,1,2,3,... 
(d) 1/(k +1), k=0,1,2,3,... 
(e) 2/(k+2), k =0,1,2,3,... 


Consider the recurrence relation a, = —8a,_ 1 — 15a,_2 with initial conditions aj = 0 
and a, = 2. Which of the following is an explicit solution to this recurrence relation? 


a) ay = (—3)* — (—5)* 


d) ax = (—5)* — (-3)* 
(e) ax = k(—5)* — k(—3)* 


. Consider the recurrence relation a, = 6a,_ 1 — Yaz—2 with initial conditions aj = 0 


and a, = 2. Which of the following is an explicit solution to this recurrence relation, 
provided the constants A and B are chosen correctly? 


(a) a, = A3”" + B3” 

(b) a, = A3” + B(-3)” 
(c) an = A3” + nB3" 
(d) a, = A(—3)” + nB(-3)” 
(e) an = nA3” + nB3" 


In the Towers of Hanoi puzzle H(8, S, E, G), the configuration is 
Pole S$: 6,5; Pole E: 1; Pole G: 8,7,4,3,2. 


What move was just made to create this configuration? 
(a) washer 1 from S to E 
(b) washer 1 from G to E 
( 

(d) washer 2 from E to G 


) 
c) washer 2 from $ to G 
) 
(e) 
In t 


washer 5 from G to S 
the Towers of Hanoi puzzle H(8, 5, E, G), the configuration is 
Pole $: 6, 5; Pole E: empty; Pole G: 8, 7, 4 ,3 ,2,1. 


What are the next two moves? 

(a) washer 1 from G to E followed by washer 2 from G to S 
(b) washer 1 from G to S followed by washer 2 from G to E 
(c) washer 5 from S to E followed by washer 1 from G to E 
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12. 


13. 


14. 


15. 


16. 


17. 


18. 


Review Questions 


(d) washer 5 from S to E followed by washer 1 from G to S 
(e) washer 5 from S to E followed by washer 2 from G to S 
In the Towers of Hanoi puzzle H(8, S, E, G), the configuration is 
Pole S$: 6, 5, 2; Pole E: 1; Pole G: 8, 7, 4 ,3. 
The next move is washer 2 from S to G. What is the RANK of this move in the list of 
all moves for H(8, S, E, G)? 
(a) 205 (b) 206 = (c)214.—Ss (d) 215—S (e) 216 
In the subset Gray code for n = 6, what is the next element after 111000? 
(a) 000111 
(b) 101000 
111001 
111100 
101100 


(c 
(d 
(e 
In the subset Gray code for n = 6, what is the element just before 110000? 
010000 
100000 


) 

) 

) 

t 

(a) 

(b) 

(c) 110001 

(d) 110100 

(e) 111000 

In the subset Gray code for n = 6, what is the RANK of 110000? 

(a) 8 (b) 16 (c) 32 (d) 48 (e) 63 

In the subset Gray code for n = 6, what is the element of RANK 52? 
(a) 101011 

(b) 101110 
(c) 101101 
(d) 110000 
(e) 111000 


The probability of team A winning any game is 1/3. Team A plays team B in a tour- 
nament. If either team wins two games in a row, that team is declared the winner. 
At most three games are played in the tournament and, if no team has won the tour- 
nament at the end of three games, the tournament is declared a draw. What is the 
expected number of games in the tournament? 


(a) 3 (b) 19/9 (c) 22/9 (d) 25/9 (e) 61/27 
The probability of team A winning any game is 1/2. Team A plays team B in a tour- 


nament. If either team wins two games in a row, that team is declared the winner. At 
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19. 


20. 


21. 


22. 


23. 


most four games are played in the tournament and, if no team has won the tournament 
at the end of four games, the tournament is declared a draw. What is the expected 
number of games in the tournament? 


(a) 4 (b) 11/4 (c) 13/4 (d) 19/4 (e) 21/8 
A man starts with one dollar in a pot. A “play” consists of flipping a fair coin and, 
e if heads occurs, doubling the amount in the pot, 


e if tails occurs, losing one dollar from the pot. The game ends if the man has zero 
dollars or if he has played three times. Let Y denote the random variable which, 
for each outcome of the game, specifies the amount of money in the pot. What is 
the value of Var(Y)? 


(a) 9/8  (b) 10/8 ~=— (ce) 12/8 += (ad) 14/8 ~~— (e) 447/64 


We are given an urn that has one red ball and one white ball. A fair die is thrown. If 
the number is a 1 or 2, one red ball is added to the urn. Otherwise two red balls are 
added to the urn. A ball is then drawn at random from the urn. Given that a red ball 
was drawn, what is the probability that a 1 or 2 appeared when the die was thrown? 


(a) 4/13 (b) 5/13 (c) 6/13 (d) 7/13 (e) 8/13 
In a certain college, 
e 10 percent of the students are science majors. 
e 10 percent are engineering majors. 
e 80 percent are humanities majors. 
e Of the science majors, 20 percent have read Newsweek. 
e Of the engineering majors, 10 percent have read Newsweek. 
e Of the humanities majors, 20 percent have read Newsweek. 


Given that a student selected at random has read Newsweek, what is the probability 
that that student is an engineering major? 


(a) 1/19 (b) 2/19 = (c) 5/19 = (a) 9/19 ~—(e) 10/19 


The probability of team A winning any game is 1/3. Team A plays team B in a 
tournament. If either team wins two games in a row, that team is declared the winner. 
At most four games are played and, if no team has won the tournament at the end of 
four games, a draw is declared. Given that the tournament lasts more than two games, 
what is the probability that A is the winner? 


(a) 1/9 = (b) 2/9 (ce) 4/9 (d) 5/9 (e) 6/9 


Ten percent of the students are science majors (S), 20 percent are engineering majors 
(E), and 70 percent are humanities majors (H). Of 5,10 percent have read 2 or more 
articles in Newsweek, 20 percent 1 article, 70 percent 0 articles. For E, the correspond- 
ing percents are 5, 15, 80. For H they are 20, 30, 50. Given that a student has read 0 
articles in Newsweek, what is the probability that the student is S or E (i.e., not H)? 


(a) 21/58 (b) 23/58 (c) 12/29 (d) 13/29 (e) 1/2 
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Basic Concepts in Graph Theory 


We used decision trees in Unit DT and used them to study decision making. However, 
we did not look at their structure as trees. In fact, we didn’t even define a tree precisely. 
What is a tree? It is a particular type of graph, which brings us to the subject of this unit. 


Section 1: What is a Graph? 


There are various types of graphs, each with its own definition. Unfortunately, some 
people apply the term “graph” rather loosely, so you can’t be sure what type of graph 
they’re talking about unless you ask them. After you have finished this chapter, we expect 
you to use the terminology carefully, not loosely. To motivate the various definitions, we’ll 
begin with some examples. 


Example 1 (A computer network) Computers are often linked with one another so 
that they can interchange information. Given a collection of computers, we would like to 
describe this linkage in fairly clean terms so that we can answer questions such as “How can 
we send a message from computer A to computer B using the fewest possible intermediate 
computers?” 


We could do this by making a list that consists of pairs of computers that are connected. 
Note that these pairs are unordered since, if computer C can communicate with computer 
D, then the reverse is also true. (There are sometimes exceptions to this, but they are 
rare and we will assume that our collection of computers does not have such an exception.) 
Also, note that we have implicitly assumed that the computers are distinguished from each 
other: It is insufficient to say that “A PC is connected to a Mac.” We must specify which 
PC and which Mac. Thus, each computer has a unique identifying label of some sort. 


For people who like pictures rather than lists, we can put dots on a piece of paper, 
one for each computer. We label each dot with a computer’s identifying label and draw a 
curve connecting two dots if and only if the corresponding computers are connected. Note 
that the shape of the curve does not matter (it could be a straight line or something more 
complicated) because we are only interested in whether two computers are connected or 
not. Below are two such pictures of the same graph. Each computer has been labeled by 
the initials of its owner. 


EN 


™ 
SH RL cS 


MN SE EN MN SE 


SH RL 
SM Mt SM 


CS 
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Computers (vertices) are indicated by dots (e) with labels. The connections (edges) are 
indicated by lines. When lines cross, they should be thought of as cables that lie on top of 
each other — not as cables that are joined. O 


The notation P;,(V) stands for the set of all k-element subsets of the set V. Based on 
the previous example we have 


Definition 1 (Simple graph) =A simple graph G is a pair G = (V, EF) where 
e V is a finite set, called the vertices of G, and 


e FE is a subset of P2(V) (i.e., a set E of two-element subsets of V ), called the edges 
of G. 


In our example, the vertices are the computers and a pair of computers is in & if and only 
if they are connected. 


Example 2 (Routes between cities) Imagine four cities named, with characteristic 
mathematical charm, A,B,C and D. Between these cities there are various routes of 
travel, denoted by a,b,c,d,e, f and g. Here is picture of this situation: 


a 


Cc D 


Looking at this picture, we see that there are three routes between cities B and C. These 
routes are named d,e and f. Our picture is intended to give us only information about 
the interconnections between cities. It leaves out many aspects of the situation that might 
be of interest to a traveler. For example, the nature of these routes (rough road, freeway, 
rail, etc.) is not portrayed. Furthermore, unlike a typical map, no claim is made that 
the picture represents in any way the distances between the cities or their geographical 
placement relative to each other. The object shown in this picture is called a graph. 


Following our previous example, one is tempted to list the pairs of cities that are 
connected; in other words, to extract a simple graph from the information. Unfortunately, 
this does not describe the problem adequately because there can be more than one route 
connecting a pair of cities; e.g., d, e and f connecting cities B and C in the figure. How 
can we deal with this? Here is a precise definition of a graph of the type required to handle 
this type of problem. J 


148 


Section 1: What is a Graph? 


Definition 2 (Graph) A graph is a triple G = (V, E,¢) where 
e V is a finite set, called the vertices of G, 
e FE is a finite set, called the edges of G, and 


e ¢ is a function with domain E and codomain P2(V). 


In the pictorial representation of the cities graph, G = (V, E,¢) where 
¥ ={4.2,0,)}, E = {a,b,¢,0,e, 7,9} 
and 


6=( a b Cc d € f g ) 
~ \ {A,B} {A,B} {A,C} {B,C} {B,C} {B,C} {B,D} ]° 


Definition 2 tells us that to specify a graph G it is necessary to specify the sets V and E 
and the function ¢. We have just specified V and @ in set theoretic terms. The picture of 
the cities graph specifies the same V and ¢ in pictorial terms. The set V is represented 
clearly by dots (e), each of which has a city name adjacent to it. Similarly, the set E is also 
represented clearly. The function ¢ is determined from the picture by comparing the name 
attached to a route with the two cities connected by that route. Thus, the route name d is 
attached to the route with endpoints B and C. This means that ¢(d) = {B,C}. 


Note that, since part of the definition of a function includes its codomain and domain, ¢ 
determines P2(V) and FE. Also, V can be determined from P2(V). Consequently, we could 
have said that a graph is a function ¢ whose domain is a finite set and whose codomain is 
P2(V) for some finite set V. Instead, we choose to specify V and EF explicitly because the 
vertices and edges play a fundamental role in thinking about a graph G. 


The function ¢ is sometimes called the incidence function of the graph. The two 
elements of $(x) = {u,v}, for any x € F, are called the vertices of the edge x, and we say 
u and v are joined by x. We also say that u and v are adjacent vertices and that wu is 
adjacent to v or, equivalently, v is adjacent to u. For any v € V, if v is a vertex of an edge 
x then we say x is incident on v. Likewise, we say v is a member of x, v is on 2, or v is in 
x. Of course, v is a member of x actually means v is a member of ¢(z). 


Here are two additional pictures of the same cities graph given above: 


The drawings look very different but exactly the same set V and function ¢ are specified 
in each case. It is very important that you understand exactly what information is needed 
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to completely specify the graph. When thinking in terms of cities and routes between them, 
you naturally want the pictorial representation of the cities to represent their geographical 
positioning also. If the pictorial representation does this, that’s fine, but it is not a part of 
the information required to define a graph. Geographical location is extra information. The 
geometrical positioning of the vertices A,B,C and D is very different, in the first of the 
two pictorial representations above, than it was in our original representation of the cities. 
However, in each of these cases, the vertices on a given edge are the same and hence the 
graphs specified are the same. In the second of the two pictures above, a different method 
of specifying the graph is given. There, ¢~', the inverse of ¢, is given. For example, 
¢ ‘({C, B}) is shown to be {d,e, f}. Knowing ¢~! determines ¢ and hence determines G 
since the vertices A,B,C and D are also specified. 


Example 3 (Loops) A loop is an edge that connects a vertex to itself. Graphs and 
simple graphs as defined in Definitions 1 and 2 cannot have loops. Why? Suppose e € EF is 
a loop in a graph that connects v € V to itself. Then ¢(e) = {v,v} = {v} because repeated 
elements in the description of a set count only once — they’re the same element. Since 
{vu} ¢ Po(V), the range of ¢, we cannot have ¢(e) = {v, v}. In other words, we cannot have 
a loop. 


Thus, if we want to allow loops, we will have to change our definitions. For a graph, 
we expand the codomain of ¢ to be P2(V) UP, (V). For a simple graph we need to change 
the set of allowed edges to include loops. This can be done by saying that FE is a subset 
of Po(V) U Pi(V) instead of a subset of just P2(V). For example, if V = {1,2,3} and 
E = {{1,2}, {2}, {2,3}}, this simple graph has a loop at vertex 2 and vertex 2 is connected 
by edges to the other two vertices. When we want to allow loops, we speak of a graph with 
loops or a simple graph with loops. 


Examples of graphs with loops appear in the exercises. [J 


We have two definitions, Definition 1 (simple graph) and Definition 2 (graph). How 
are they related? Let G = (V,E) be a simple graph. Define ¢: EF — E to be the identity 
map; ie., d(e) = e for alle € E. The graph G’ = (V,E,@) is essentially the same as G. 
There is one subtle difference in the pictures: The edges of G are unlabeled but each edge 
of G’ is labeled by a set consisting of the two vertices at its ends. But this extra information 
is contained already in the specification of G. Thus, simple graphs are a special case of 
graphs. 


Definition 3 (Degrees of vertices) Let G=(V,E,¢) bea graph and v € V a vertex. 
Define the degree of v, d(v) to be the number of e € E such that v € $(e); i.e., e is incident 
on v. 

Suppose |V| = n. Let dy, d2,...,dn, where dy < dy <--- < dy, be the sequence of degrees 
of the vertices of G, sorted by size. We refer to this sequence as the degree sequence of 
the graph G. 


In the graph for routes between cities, d(A) = 3, d(B) = 6, d(C) = 4, and d(D) = 1. The 
degree sequence is 1,3,4,6. 
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Sometimes we are interested only in the “structure” or “form” of a graph and not in 
the names (labels) of the vertices and edges. In this case we are interested in what is called 
an unlabeled graph. A picture of an unlabeled graph can be obtained from a picture of a 
graph by erasing all of the names on the vertices and edges. This concept is simple enough, 
but is difficult to use mathematically because the idea of a picture is not very precise. 


The concept of an equivalence relation on a set is an important concept in mathe- 
matics and computer science. We'll explore it here and will use it to develop an intuitive 
understanding of unlabeled graphs. Later we will use it to define connected components 
and biconnected components. Equivalence relations are discussed in more detail in A Short 
Course in Discrete Mathematics, the text for the course that precedes this course. 


Definition 4 (Equivalence relation) An equivalence relation on a set S is a partition 
of S. We say that s,t € S are equivalent if and only if they belong to the same block 
(called an equivalence class in this context) of the partition. If the symbol ~ denotes the 
equivalence relation, then we write s ~ t to indicate that s and t are equivalent. 


Example 4 (Equivalence relations) Let S be any set and let all the blocks of the 
partition have one element. Two elements of S are equivalent if and only if they are the 
same. This rather trivial equivalence relation is, of course, denoted by “=”. 


Now let the set be the integers Z. Let’s try to define an equivalence relation by saying 
that n and k are equivalent if and only if they differ by a multiple of 24. Is this an 
equivalence relation? If it is we should be able to find the blocks of the partition. There are 
24 of them, which we could number 0,...,23. Block j consists of all integers which equal 
j plus a multiple of 24; that is, they have a remainder of 7 when divided by 24. Since two 
numbers belong to the same block if and only if they both have the same remainder when 
divided by 24, it follows that they belong to the same block if and only if their difference 
gives a remainder of 0 when divided by 24, which is the same as saying their difference is 
a multiple of 24. Thus this partition does indeed give the desired equivalence relation. 


Now let the set be Z x Z*, where Z* is the set of all integers except 0. Write (a,b) ~ 
(c,d) if and only if ad = bc. With a moment’s reflection, you should see that this is a way 
to check if the two fractions a/b and c/d are equal. We can label each equivalence class 
with the fraction a/b that it represents. In an axiomatic development of the rationals from 
the integers, one defines a rational number to be just such an equivalence class and proves 
that it is possible to add, subtract, multiply and divide equivalence classes. 


Suppose we consider all functions S = m”. We can define a partition of S in a number 
of different ways. For example, we could partition the functions f into blocks where the 
sum of the integers in the Image(f) is constant, where the max of the integers in Image(f) 
is constant, or where the “type vector” of the function, namely, the number of 1’s, 2’s, etc. 
in Image(f), is constant. Each of these defines a partition of S. O 


In the next theorem we provide necessary and sufficient conditions for an equivalence 
relation. Verifying the conditions is a useful way to prove that some particular situation is 
an equivalence relation. Recall that a binary relation on a set S is a subset R of S x S. 


Theorem 1 (Equivalence Relations) Let S bea set and suppose that we have a binary 
relation RC S x S. We write s ~ t whenever (s,t) € R. This is an equivalence relation if 
and only if the following three conditions hold. 
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(i) (Reflexive) For all s € S we have s ~ s. 
(ii) (Symmetric) For all s,t € S such that s ~~ t we havet ~ s. 


(iii) (Transitive) For all r,s,t € S such that r ~~ s and s~t we have r ~ t. 


Proof: We first prove that an equivalence relation satisfies (i)—(iii). Suppose that ~ is an 
equivalence relation. Since s belongs to whatever block it is in, we have s ~ s. Since s ~ t 
means that s and ¢t belong to the same block, we have s ~ ¢ if and only if we have t ~ s. 
Now suppose that r ~ s ~ t. Then r and s are in the same block and s and ¢ are in the 
same block. Thus r and t are in the same block and so r ~ t. 


We now suppose that (i)—(iii) hold and prove that we have an equivalence relation. 
What would the blocks of the partition be? Everything equivalent to a given element 
should be in the same block. Thus, for each s € S let B(s) be the set of all t € S such that 
s~t. We must show that the set of these sets forms a partition of S. 


In order to have a partition of S, we must have 
(a) the B(s) are nonempty and every t € S is in some B(s) and 
(b) for every p,q € S, B(p) and B(q) are either equal or disjoint. 
Since ~ is reflexive, s € B(s), proving (a). Suppose z € B(p)M B(q) and y € B(p). We 


have, p~ z,q~axandp~y. Thusq~a~pw~yand so y € B(q), proving that 
B(p) C B(q). Similarly B(q) C B(p) and so B(p) = B(q). This proves (b). 0 


Example 5 (Equivalent forms) Consider the following two graphs, represented by 
pictures: 


Now let’s remove all symbols representing edges and vertices. What we have left are 
two “forms” on which the graphs were drawn. You can think of drawing a picture of a 
graph as a two step process: (1) draw the form; (2) add the labels. One student referred to 
these forms as “ghosts of departed graphs.” Note that form F, and form Fy have a certain 
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eerie similarity (appropriate for ghosts). 


If you use your imagination a bit you can see that form F, can be transformed into form F, 
by sliding vertices around and bending, stretching, and contracting edges as needed. The 
edges need not be detached from their vertices in the process and edges and vertices, while 
being moved, can pass through each other like shadows. Let’s refer to the sliding, bending, 
stretching, and contracting process as “morphing” form F,, into F,. Morphing is easily seen 
to define an equivalence relation ~ on the set of all forms. Check out reflexive, symmetric, 
and transitive, for the morphing relation ~. By Theorem 1, the morphing equivalence 
relation partitions the set of all forms of graphs into blocks or equivalence classes. This is 
a good example where it is easier to think of the relation ~ than to think globally of the 
partition of the forms. 


Now suppose we have any two graphs, Ga = (Va, Ea, a) and Gy = (Vp, Ev, dy). Think 
of these graphs not as pictures, but as specified in terms of sets and functions. Now choose 
forms F,, and Fy for Gg and Gy respectively, and draw their pictures. We leave it to your 
intuition to accept the fact that either F, ~ Fy, no matter what you choose for Fy, and Fy, 
or Ff, % F, no matter what your choice is for the forms F,, and F,. If Fy ~ Fy we say that 
G, and Gp are isomorphic graphs and write Gg ~ Gy. The fact that ~ is an equivalence 
relation forces © to be an equivalence relation also. In particular, two graphs G, and Gy 
are isomorphic if and only if you can choose any form F, for drawing Gz and use that same 
form for G,. O 


In general, deciding whether or not two graphs are isomorphic can be very difficult 
business. You can imagine how hard it would be to look at the forms of two graphs with 
thousands of vertices and edges and deciding whether or not those forms are equivalent. 
There are no general computer programs that do the task of deciding isomorphism well. For 
graphs known to have special features, isomorphism of graphs can sometimes be decided 
efficiently. In general, if someone presents you with two graphs and asks you if they are 
isomorphic, your best answer is “no.” You will be right most of the time. 
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*Random Graphs 


We now look briefly at a subject called random graphs. They often arise in the analysis of 
graphical algorithms and of systems which can be described graphically (such as the web). 
There are two basic ways to describe random graphs. One is to let the probability space 
be the set of all graphs with, for example, n vertices and q edges and use the uniform dis- 
tribution. The other, which is often easier to study, is described in the following definition. 
It is the one we study here. 


*Definition 5 (Random graph model) Let G(n,p) be the probability space obtained 
by letting the elementary events be the set of all n-vertex simple graphs with V = n. If 
G €G(n,p) has m edges, then P(G) = p™qN—™ where q=1-p and N = (5). 


We need to show that G(n, p) is a probability space. There is a nice way to see this by 
reinterpreting P. List the N = (5) vertices P2(V) in lex order. Let the sample space be 
U = x% {choose, reject} with P(a1,...,an) = P*(a,) x---x P*(ay) where P*(choose) = p 
and P*(reject) = 1 — p. We’ve met this before in Unit Fn and seen that it is a probability 


space. To see that it is, note that P > 0 and 


S> Pla,..-,aw)= > P*(ay) x +++ x P*(an) 


Gunean : (Sm) een ek (© P*(ax)) 
ify bela Oya St eee Al 


Why is this the same as the definition? Think of the chosen pairs as the edges of 
a graph chosen randomly from G(n,p). If G has m edges, then its probability should be 
p™(1—p)*~-™ according to the definition. On the other hand, since G has m edges, exactly 
m of a1,...,an equal “choose” and so, in the new space, P(a1,...,an) = p'™(1—p)N-™ 
also. We say that we are choosing the edges of the random graph independently. 


*Example 6 (The number of edges in random graph) Let X be a random variable 
that counts the number of edges in a random graph in G(n,p). What are the expected 
value and variance of X? In U = x% {choose, reject}, let 


1 if a; = choose, 
MNase 0n ) = { 0 if a; = reject. 


You should be able to see that X = X;+---+Xy and that the X; are independent random 


variables with P(X; = 1) = p. This is just the binomial distribution (Unit Fn). We showed 
that the mean is Np and the variance is Npq, where N = (3) andg=1-p. O 
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*Example 7 (Triangles in random graphs) How often can we find 3 vertices {u,v, w} 
in a random graph so that {u,v}, {u,w}, and {v,w} are all edges of the graph? In other 
words, how often can we find a “triangle”? How can we do this? 


First, we need a sample space. It will be the random graph space introduced in 
Definition 5. Since we want to count something (triangles), we need a random variable. 
Let X be a random variable whose value is the number of triples of vertices such that the 
three possible edges connecting them are present in the random graph. In other words, X 
is defined for each graph, G, and its value, X(G), is the number of triangles in the graph 
G. We want to compute F(X). It would also be nice to compute Var(X) since that gives 
us some idea of how much X tends to vary from graph to graph — large Var(X) means 
there tends to be a lot of variation in the number of triangles from graph to graph and 
small Var(X) means there tends to be little variation. 


Let Xu,v,w be a random variable which is 1 if the triangle with vertices {u,v,w} is 
present and 0 otherwise. Then X is the sum of Xy,y,~ over all {u,v,w} € P3(V). Since 
expectation is linear, E(X) is the sum of E(Xy,y,w) over all {u,v,w} € P3(V). Clearly 
E(Xu,v,w) does not depend on the particular triple. Since there are (3) possibilities for 


{u,v, w}, E(X) = (3) E(X1,2,3)- 
We want to compute £(X1,2,.3). It is given by 


The only way X1,2,3 = 1 can happen is for the edges {1,2}, {1,3}, and {2,3} to all be 
present in the graph. (We don’t care about any of the other possible edges.) Since each of 
these events has probability p and the events are independent we have P(X1,2.3 = 1) = p®. 
Thus E(X1,2,3) = p* and so E(X) = (%)p%. In other words, on average we see about (5)p® 
triangles. For example, if p = 1/2 all graphs are equally likely (You should show this.) and 
so the average number of triangles over all graphs with n vertices is (3) /8. When n = 5, 
this average is 1.25. Can you verify this by looking at all the 5-vertex graphs? How much 
work is involved? 

What happens when n is very large? Then ) = nin) “behaves like” n?/6. 
(“Behaves like” means that, as n goes to infinity, the limit of the ratio (4) /(n?/6) is 1.) 


3 
Thus the expected number of triangles behaves like (np)?/6. 


What about the variance? We’ll work it out in the next example. For now, we'll 
simply tell you that it behaves like n*p?(1 — p?)/2. What does this tell us for large n? 
The standard deviation behaves like n?p*/?,/(1 — p2)/2. A more general version of the 
central limit theorem than we have discussed tells us the number of triangles tends to have 
a normal distribution with p= (np)?/6 and o = n?p?/?,/(1 — p?)/2. If p is constant, o will 
grow like a constant times n?, which is much smaller than p for large n. Thus the number 
of triangles in a random graph is almost surely close to (np)?/6. O 


*Example 8 (Variance for triangles in random graphs) This is a continuation of the 
previous example. Since the various X,,,,.~. may not be independent, this is harder. Since 
Var(X) = E(X?) — E(X)?, we will compute E(X?). Since X is a sum of terms of the form 
Aarts X? is a sum of terms of the form Xuv,wXa,b,c: Using linearity of expectation, we 
need to compute E(Xy,»,wXa,b,c) for each possibility and add them up. 
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Now for the tricky part: This expectation depends on how many vertices {u,v,w} and 
{a,b,c} have in common. 


e If {u,v,w} = {a,b,c}, then Xuo,wXao,.c = Xu,v,w and its expectation is p? by the 
previous example. 


e If {u,v,w} and {a, b,c} have two vertices in common, then the two triangles have only 
5 edges total because they have a common edge. Note that Xuo.wXa,b,c is 1 if all 
five edges are present and is 0 otherwise. Reasoning as in the previous example, the 
expectation is p°. 

e If {u,v,w} and {a,b,c} have less than two vertices in common, we are concerned about 
six edges and obtain p® for the expectation. 


To add up the results in the previous paragraph, we need to know how often each 
occurs in 


x? = ( S> hess) ( > Xo.se] = Ss" Naina ne be ‘ 


{u,v,w}EP3(V) {a,b,c}EP3(V) {u,v,w}EP3(V) 
{a,b,c}EP3(V) 


e When {u,v,w} = {a,b,c}, we are only free to choose {u,v,w} and this can be done 
in (5) ways so we have (3)p° contributed to E(X7?). 


e Suppose {u,v,w} and {a,b,c} have two vertices in common. How many ways can this 
happen? We can first choose {u,v,w}. Then choose two of u, v, w to be in {a,b,c} 
and then choose the third vertex in {a,b,c} to be different from u, v, and w. This can 


be done in (") : e x (n — 3) = 3(n—3) & ~ 12(7)] 


ways. Multiplying this by p® gives its contribution to E(X?). 


e The remaining case, one vertex or no vertices in common, can be done in a similar 
fashion. Alternatively, we can simply subtract the above counts from all possible ways 
of choosing {u,v,w} and {a,b,c}. This gives us 


(3) « (5) - (@) 2G) 


for the third case. Multiplying this by p® gives its contribution to E(X7?). 
Since E(X)? = (3) *®, we have that 


Var(X) = E(X)* — E(X*) = (3) (pp) 12(‘) orp), 


after a bit of algebra using the results in the preceding paragraph. Whew! OJ 


The previous material would be more difficult if we had used the model for random 
graphs that was suggested before Definition 5. Why is this? The model we are using lets 
us ignore possible edges that we don’t care about. The other model does not because we 
must be sure that the total number of edges is correct. 
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Exercises for Section 1 


1.1. 


1.2. 


1.3. 


We are interested in the number of simple graphs with V = n. 
(a) Prove that there are 9(2) (2 to the power (5)) such simple graphs. 
(b) How many of them have exactly q edges? 


Let (V, £,¢) be a graph and let d(v) be the degree of the vertex v € V. Prove that 
evev Uv) = 2|E], an even number. Conclude that the number of vertices v for 
which d(v) is odd is even. 


Let Q = (V, E,¢) be a graph where 
Vv = {A, B,C, D, E, Ee G, Fy E = {a, b, Cc, d, e, niCr Ns k} 


and 
a b cde fgh i jek 
o=|A A D FE A E BFGCA 
BD EBBGFGCCA 


In this representation of ¢, the first row specifies the edges and the two vertices 
below each edge specify the vertices incident on that edge. Here is a pictorial 
representation P(Q) of this graph. 


a 


f 
Jj 
PO) te 2 ee 
i ee or 


Note that ¢(k) = {A, A} = {A}. Such an edge is called a loop. (See Example 3.) 
Adding a loop to a vertex increases its degree by two. The vertex H, which does 
not belong to ¢() for any edge z (i.e., has no edge incident upon it), is called an 
isolated vertex. The degree of an isolated vertex is zero. Edges, such as a and e 
of Q, with the property that ¢(a) = ¢(e) are called parallel edges. If all edge and 
vertex labels are removed from P(Q) then we get the following picture P’(Q): 


P’(Q) Oye e 


The picture P’(Q) represents the “form” of the graph just described and is some- 
times referred to as a pictorial representation of the “unlabeled” graph associated 
with Q. (See Example 5.) For each of the following graphs R, where R = (V, FE, ¢), 
V = {A, B,C, D, E, F,G, H}, draw a pictorial representation of R by starting with 
P'(Q), removing and/or adding as few edges as possible, and then labeling the 
resulting picture with the edges and vertices of R. A graph R which requires no 
additions or removals of edges is said to be “of the same form as” or “isomorphic 
to” the graph Q (Example 5). 
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(a) Let 
B= {a,0,6,d.e,9 Gg, 1,4, 7,6} 


be the set of edges of R and 


abe de fg h ti gjik 
g=|C C F A H FE FE AD AA 
C G GH H H F H GODF 
(b) Let 
B= 41,2, 3,4,5, 6, 7,8; 9; 10,11} 
be the set of edges of R and 
1 2 3 4 5 6 7 8 9 10 11 
o=|A EF FE FE F GHBC ODE 
G H E F GH B C D D EH 


1.4. Let Q = (V, E,¢) be the graph where 
Vv = {A, B, C, D,E,F, G, H}, E = ee b, c,d, e, 750s h, 1,9; k, I} 
and 


a bc def gh i gj iki tl 
o=|A A D FE AE BFGCAE 
B D E BBGFGCCAG 


(a) What is the degree sequence of Q? 


Consider the following unlabeled pictorial representation of Q 


(a) Create a pictorial representation of Q by labeling P’(Q) with the edges and 


vertices of Q. 


(b) A necessary condition that a pictorial representation of a graph R can be 
created by labeling P’(Q) with the vertices and edges of R is that the degree 


sequence of R be (0,2,2,3,4,4,4,5). True or false? Explain. 


(c) A sufficient condition that a pictorial representation of a graph R can be created 
by labeling P’(Q) with the vertices and edges of R is that the degree sequence 


of R be (0,2, 2,3,4,4,4,5). True or false? Explain. 


1.5. In each of the following problems information about the degree sequence of a graph 
is given. In each case, decide if a graph satisfying the specified conditions exists or 


not. Give reasons in each case. 


(a) A graph Q with degree sequence (1,1, 2,3,3,5)? 
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*1.8. 


Section 1: What is a Graph? 
(b) A graph Q with degree sequence (1,2, 2,3,3,5), loops and parallel edges al- 
lowed? 


(c) A graph Q with degree sequence (1, 2,2,3,3,5), no loops but parallel edges 
allowed? 


(d) A graph Q with degree sequence (1,2,2,3,3,5), no loops or parallel edges 
allowed? 
(e) A simple graph Q with degree sequence (3, 3, 3,3)? 
(f) A graph Q with degree sequence (3,3,3,3), no loops or parallel edges allowed? 
g) A graph Q with degree sequence (3,3,3,5), no loops or parallel edges allowed? 
) 


( 
(h) A graph Q with degree sequence (4,4,4,4,4), no loops or parallel edges al- 
lowed? 


(i) A graph Q with degree sequence (4,4,4,4,6), no loops or parallel edges al- 
lowed? 


Divide the following graphs into isomorphism equivalence classes and justify your 
answer; i.e., explain why you have the classes that you do. In all cases V = 4, 


™ a b c d € f 
© = (4% {1,2} {2,3} {3,4} {1,4} en) 
A B C D EB F 
0) = (4) {1,4} {1,4} {1,2} {2,3} eu) 
“ = (oly {1,3} {34} {4} {1,2} | 


PpeQ RK S T U 
@) 6= (435 {2,4} {1,3} {3,4} {1,2} a) 


In Example 7, suppose that p is a function of n, say p = p(n). 


(a) Show that the expected number of triangles behaves like 1 for large n if p(n) = 
61/3 /n, 


(b) Suppose the expected number of triangles behaves like 1. How does the ex- 
pected number of edges behave? 


Instead of looking for triangles as in Example 7, let’s look for quadrilaterals having 

both diagonals. In other words, we’ll look for sets of four vertices such that all of 

the (5) = 6 possible edges between them are present. 

(a) Show that the expected number of such quadrilaterals is (7})p®. 

(b) Suppose n is large and p is a function of n so that we expect to see 1 quadri- 
lateral on average. About how many edges do we expect to see? 
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(c) Generalize this problem from sets of 4 vertices to sets of k vertices. 


*1.9. Show that the variance of X, the number of triangles in a random graph as com- 
puted in Example 8 satisfies 


var(x) = (3 )p*(( p®) +3(n—3)(1 p?) <3n() p81 —9?), 


Hint: 1-p? <1—-p? <1. 


Section 2: Digraphs, Paths, and Subgraphs 


In this section we introduce the notion of a directed graph and give precise definitions 
of some very important special substructures of both graphs and directed graphs. 


Example 9 (Flow of commodities) Look again at Example 2. Imagine now that the 
symbols a, b, c, d, e, f and g, instead of standing for route names, stand for commodities 
(applesauce, bread, computers, etc.) that are produced in one town and shipped to another 
town. In order to get a picture of the flow of commodities, we need to know the directions 
in which they are shipped. This information is provided by picture below: 


In set-theoretic terms, the information needed to construct the above picture can be 
specified by giving a pair D = (V, E,¢) where ¢ is a function. The domain of the function 
od is E = {a,b,c,d,e, f,g} and the codomain is V x V. Specifically, 

Ae a b Cc d e€ f g 
=\(B,4) (A,B) (CA) (CB) (B,C) (CB) (D,B)): 


The structure specified by this information is an example of a directed graph, which we now 
define. O 
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Definition 6 (Directed graph) A directed graph (or digraph) is a triple D = (V, E, ¢) 
where V and E are finite sets and ¢ is a function with domain E and codomain V x V. 
We call E the set of edges of the digraph D and call V the set of vertices of D. 


Just as with graphs, we can define a notion of a simple digraph. A simple digraph is a pair 
D = (V,E), where V is a set, the vertex set, and E C V x V is the edge set. Just as with 
simple graphs and graphs, simple digraphs are a special case of digraphs in which ¢ is the 
identity function on E; that is, d(e) = e for all e € E. 


There is a correspondence between simple graphs and simple digraphs that is fairly 
common in applications of graph theory. To interpret simple graphs in terms of simple 
digraphs, it is best to consider simple graphs with loops (see Example 3 and Exercises 
for Section 1). Thus consider G = (V,E) where E C P2(V) UPi(V). We can identify 
{u,v} € Po(V) UPL(V) with (u,v) € V x V and with (v,u) € V x V. In the case were we 
have a loop, u = v, then we identify {u} with (u,u). Here is a picture of a simple graph 
and its corresponding digraph: 


Cy) / , , 


e 
Cc Cc 


(a) (b) 


Each, edge that is not a loop in the simple graph is replaced by two edges “in opposite 
directions” in the corresponding simple digraph. A loop is replaced by a directed loop 
(e.g., {A} is replaced by (A, A)). 


Simple digraphs appear in mathematics under another important guise: binary rela- 
tions. A binary relation on a set V is simply a subset of V x V. Often the name of the 
relation and the subset are the same. Thus we speak of the binary relation E C V x V. If 
you have absorbed all the terminology, you should be able to see immediately that (V, F) 
is a simple digraph and that any simple digraph (V’, E’) corresponds to a binary relation 
EY ey", 


Recall that a binary relation R is called symmetric if (u,v) € R implies (v,u) € R. 
Thus a simple graph with loops allowed corresponds to a symmetric binary relation on the 
set of vertices. 


An equivalence relation on a set S is a particular type of binary relation RC Sx S. 
For an equivalence relation, we have (z,y) € R if and only if x and y are equivalent (i.e., 
belong to the same equivalence class or block). Note that this is a symmetric relationship, 
so we may regard the associated simple digraph as a simple graph. Which simple graphs 
(with loops allowed) correspond to equivalence relations? As an example, take S = 7 and 
take the equivalence class partition to be {{1,2,3,4},{5,6, 7}}. Since everything in each 
block is related to everything else, there are (5) = 6 non-loops and (1) = 4 loops associated 
with the block {1,2,3,4} for a total of ten edges. With the block {5,6,7} there are three 
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loops and three non-loops for a total of six edges Here is the graph of this equivalence 


relation: 


JD G @) 


A complete simple graph G=(V,E) with loops is a graph with every possible edge. 
That is, FE = P2(V) UP1(V). In the above graph, each block of the equivalence relation is 
replaced by the complete simple graph with loops on that block. This is the general rule. 


A basic method for studying graphs and digraphs is to study substructures of these 
objects and their properties. One of the most important of these substructures is called a 
path. 


Definition 7 (Path, trail, walk and vertex sequence) Let G = (V,E,¢) bea graph. 


Let €1,€2,...,;@€n—1 be a sequence of elements of E (edges of G') for which there is a 
sequence a1, 42,...,@,, of distinct elements of V (vertices of G) such that $(e;) = {a;, ai41} 
fori =1,2,...,n—1. The sequence of edges e€1,€2,...,@n—1 is called a path inG. The 
sequence of vertices a1, 42,...,@n is called the vertex sequence of the path. (Note that since 
the vertices are distinct, so are the edges.) 


If we require that e1,...,€,—1 be distinct, but not that aj,...,@n, be distinct, the 
sequence of edges is called a trail. 


If we do not even require that the edges be distinct, it is called a walk. 


If G = (V,E,¢@) is a directed graph, then $(e;) = {a;,a;41} is replaced by $(e;) = 
(a;,@i+1) in the above definition to obtain a directed path, trail, and walk respectively. 


Note that the definition of a path requires that it not intersect itself (i.e., have repeated 
vertices), while a trail may intersect itself. Although a trail may intersect itself, it may not 
have repeated edges, but a walk may. If P = (e1,...,€n—1) is a path in G = (V, E, ¢) with 
vertex sequence Q1,...,@,, then we say that P is a path from a, to ay. Similarly for a trail 
or a walk. 


In the graph of Example 2, the sequence c, d, g is a path with vertex sequence A, C, B, D. 
If the graph is of the form G = (V,E) with EF C P2(V), then the vertex sequence alone 
specifies the sequence of edges and hence the path. Thus, Example 1, the vertex sequence 
MN, SM, SE, TM specifies the path {MN, SM}, {SM, SE}, {SE, TM}. Similarly for di- 
graphs. Consider the graph of Example 9. The edge sequence P = (g,e,c) is a directed 
path with vertex sequence (D, B,C, A). The edge sequence P = (g,e,c,b,a) is a directed 
trail, but not a directed path. The edge sequence P = (d,e,d) is a directed walk, but not 
a directed trail. 
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Note that every path is a trail and every trail is a walk, but not conversely. However, 
we can show that, if there is a walk between two vertices, then there is a path. This rather 
obvious result can be useful in proving theorems, so we state it as a theorem. 


Theorem 2 (Walk implies path) Suppose u ¥ v are vertices in the graph G = (V, E,¢). 
The following are equivalent: 


(a) There is a walk from u to v. 
(b) There is a trail from u to v. 
(c) There is a path from wu to v. 


Furthermore, given a walk from u to v, there is a path from u to v all of whose edges are 
in the walk. 


Proof: Since every path is a trail, (c) implies (b). Since every trail is a walk, (b) implies 
(a). Thus it suffices to prove that (a) implies (c). Let e1,e2,...,e, be a walk from u to 
v. We use induction on n, the number of repeated vertices in a walk. If the walk has no 
repeated vertices, it is a path. This starts the induction at n = 0. Suppose n > 0. Let 
r be a repeated vertex. Suppose it first appears in edge e; and last appears in edge e;. 


If r = u, then e;,...,e,% is a walk from wu to v in which r is not a repeated vertex. If 
r =v, then e;,...,e; is a walk from u to v in which r is not a repeated vertex. Otherwise, 
€1,.-.,€;,€;,--., ex is a walk from wu to v in which r is not a repeated vertex. Hence there 


are less than n repeated vertices in this walk from u to v and so there is a path by induction. 
Since we constructed the path by removing edges from the walk, the last statement in the 
theorem follows. 0 


Note that the theorem and proof are valid if graph is replaced by digraph and walk, trail, 
and path are replaced by directed walk, trail, and path. 


Another basic notion is that of a subgraph of G = (V, E,¢), which we will soon define. 
First we need some terminology about functions. By a restriction ¢! of ¢ to E’ C E, we 
mean the function ¢’ with domain E’ and satisfying ¢/(a) = ¢(x) for all a € E’. (When 
forming a restriction, we may change the codomain. Of course, the new codomain must 
contain Image(¢’) = ¢(£). In the following definition, the codomain of ¢’ must be P2(V’) 
since G’ is required to be a graph.) 


Definition 8 (Subgraph) Let G = (V, E,¢) bea graph. A graph G' = (V', E’,¢’) is 
a subgraph of G if V' CV, E' C E, and @¢ is the restriction of ¢ to E’. 


As we have noted, the fact that G’ is itself a graph means that ¢'(2) € P2(V") for each 
x € E’ and, in fact, the codomain of ¢’ must be P2(V’). If G is a graph with loops, 
the codomain of ¢’ must be P2(V’) UP\(V’). This definition works equally well if G is a 
digraph. In that case, the codomain of ¢’ must be V’ x V’. 
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Example 10 (Subgraph — key information) For the graph G = (V,E,¢) below, 
let G’ = (V’, E’,¢’) be defined by V’ = {A,B,C}, E” = {a,b,c, f}, and by @’ being the 
restriction of ¢ to E’ with codomain P2(V"). Notice that ¢’ is determined completely from 
knowing V’, E’ and ¢. Thus, to specify a subgraph G’, the key information is V’ and E’. 


As another example from the same graph, we let V’ = V and E’ = {a,b,c, f}. In this 
case, the vertex D is not a member of any edge of the subgraph. Such a vertex is called an 
isolated verter of G’. (See also Exercises for Section 1.) 


One way of specifying a subgraph is to give a set of edges E’ C E and take V’ to be 
the set of all vertices on some edge of E’. In other words, V’ is the union of the sets ¢(2) 
over all x € E’. Such a subgraph is called the subgraph induced by the edge set E’ or the 
edge induced subgraph of E’. The first subgraph of this example is the subgraph induced 
by #’ = {a, b,c, f}. 

Likewise, given a set V’ C V, we can take E’ to be the set of all edges x € E such 
that ¢(2) C V’. The resulting subgraph is called the subgraph induced by V’ or the vertex 


induced subgraph of V’. Referring to the picture again, the edges of the subgraph induced 
by V’ = {C, B}, are E’ = {d,e, f}. 


4g ee 


C D 


Look again at the above graph. In particular, consider the path c,a with vertex sequence 
C,A,B. Notice that the edge d has ¢(d) = {C,B}. The subgraph G’ = (V’,E’,@’), 
where V’ = {C,A,B} and E’ = {c,a,d} is called a cycle of G. In general, whenever 
there is a path in G, say e€1,...,€,—1 with vertex sequence aj,...,@,, and an edge x with 
(x) = {a1,a,}, then the subgraph induced by the edges e€1,...,€n—1, x is called a cycle of 
G. Parallel edges like a and b in the preceding figure induce a cycle. A loop also induces a 
cycle. OJ 


The formal definition of a cycle is: 


Definition 9 (Circuit and Cycle) Let G=(V,E,¢) bea graph and let €1,...,e€n be 
a trail with vertex sequence a1,...,@n,4,. (It returns to its starting point.) The subgraph 
G’ of G induced by the set of edges {€1,...,€n} is called a circuit of G. The length of the 
circuit is n. 


e If the only repeated vertices on the trail are a, (the start and end), then the circuit is 
called a simple circuit or cycle. 


e If “trail” is replaced by directed trail, we obtain a directed circuit and a directed cycle. 
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In our definitions, a path is a sequence of edges but a cycle is a subgraph of G. In actual 
practice, people often think of a cycle as a path, except that it starts and ends at the same 
vertex. This sloppiness rarely causes trouble, but can lead to problems in formal proofs. 
Cycles are closely related to the existence of multiple paths between vertices: 


Theorem 3 (Cycles and multiple paths) Two vertices u # v are on a cycle of G if 
and only if there are at least two paths from u to v that have no vertices in common except 
the endpoints u and v. 


Proof: Suppose u and v are on a cycle. Follow the cycle from u to v to obtain one path. 
Then follow the cycle from v to u to obtain another. Since a cycle has no repeated vertices, 
the only vertices that lie in both paths are u and v. On the other hand, a path from u to 
v followed by a path from v to u is a cycle if the paths have no vertices in common other 
thanuandv. QJ 


One important feature of a graph is whether or not any pair of vertices can be connected 
by a path. You can probably imagine, without much difficulty, applications of graph theory 
where this sort of “connectivity” is important. Not the least of such examples would be 
communication networks. Here is a formal definition of connected graphs. 


Definition 10 (Connected graph) Let G = (V,E,¢) bea graph. If for any two distinct 
elements u and v of V there is a path P from u to v then G is a connected graph. If |V| = 1, 
then G is connected. 


We make two observations about the definition. 


e Because of Theorem 2, we can replace “path” in the definition by “walk” or “trail” if 
we wish. (This observation is used in the next example.) 


e The last sentence in the definition is not really needed. To see this, suppose |V| = 1. 
Now G is connected if, for any two distinct elements u and v of V there is a path from 
u to v. This is trivially satisfied since we cannot find two distinct elements in the one 
element set V. 


The graph of Example 1 has two distinct “pieces.” It is not a connected graph. There 
is, for example, no path from u = TM tov = CS. Note that one piece of this graph consists 
of the vertex induced subgraph of the vertex set {CS,EN,SH, RL} and the other piece 
consists of the vertex induced subgraph of {T7M,SE,MN,SM}. These pieces are called 
connected components of the graph. This is the case in general for a graph G = (V, E, @): 
The vertex set is partitioned into subsets V;,Vo,...,Vm such that if u and v are in the 
same subset then there is a path from u to v and if they are in different subsets there is no 
such path. The subgraphs G; = (V1, £1, ¢1),...,;Gm = (Vm; Em, ¢m) induced by the sets 
V,,.-.,Vm are called the connected components of G. Every edge of G appears in one of 
the connected components. To see this, suppose that {u,v} is an edge and note that the 
edge is a path from u to v and so u and v are in the same induced subgraph, G;. By the 
definition of induced subgraph, {u,v} is in Gj. 
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Example 11 (Connected components as an equivalence relation) You may have 
noticed that the “definition” that we have given of connected components is a bit sloppy: 
We need to know that the partitioning into such subsets can actually occur. To see that this 
is not trivially obvious, define two integers to be “connected” if they have a common factor. 
Thus 2 and 6 are connected and 3 and 6 are connected, but 2 and 3 are not connected and 
so we cannot partition the set V = {2,3,6} into “connected components”. We must use 
some property of the definition of graphs and paths to show that the partitioning of vertices 
is possible. One way to do this is to construct an equivalence relation. 


For u,v € V, write u ~ v if and only if either wu = v or there is a walk from u to 
v. It is clear that ~ is reflexive and symmetric. We now prove that it is transitive. Let 
u~un~w. The walk from u to v followed by the walk from v to w is a walk from u to 
w. This completes the proof that u ~ v is an equivalence relation. The relation partitions 
V into subsets V;,...,Vm. By Theorem 2, the vertex induced subgraphs of the V; satisfy 
Definition 10. QO 


When talking about connectivity, graphs and digraphs are different. In a digraph, the 
fact that there is a directed walk from u to v does not, in general, imply that there is a 
directed walk from v to u. Thus, the “directed walk relation”, unlike the “walk relation” 
is not symmetric. This complicates the theory of connectivity for digraphs. 


Example 12 (Eulerian graphs) We are going to describe a process for constructing a 
graph G = (V, E,¢) (with loops allowed). Start with V = {v;} consisting of a single vertex 
and with E =. Add an edge e;, with ¢(e1) = {v1, v2}, to E. If v, = v2, we have a graph 
with one vertex and one edge (a loop), else we have a graph with two vertices and one 
edge. Keep track of the vertices and edges in the order added. Here (v1, v2) is the sequence 
of vertices in the order added and (e) is the sequence of edges in order added. Suppose 
we continue this process to construct a sequence of vertices (not necessarily distinct) and 
sequence of distinct edges. At the point where k distinct edges have been added, if v is 
the last vertex added, then we add a new edge e;41, different from all previous edges, with 
O(en41) = {v,v'} where either v’ is a vertex already added or a new vertex. Here is a 
picture of this process carried out with the edges numbered in the order added 


where the vertex sequence is 
S = (4,0, 0)6,d,.0, D, Fg, 6; G:0,4). 


Such a graph is called a graph with an Eulerian trail. The edges, in the order added, are 
the Eulerian trail and S' is the vertex sequence of the trail 


By construction, if G is a graph with an Eulerian trail, then there is a trail in G that 
includes every edge in G. If there is a circuit in G that includes every edge of G then G 
is called an Eulerian circuit graph or graph with an Eulerian circuit. Thinking about the 
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above example, if a graph has an Eulerian trail but no Eulerian circuit, then all vertices of 
the graph have even degree except the start vertex (a in our example with degree 5) and 
end vertex (g in our example with degree 3). If a graph has an Eulerian circuit then all 
vertices have even degree. The converses in each case are also true (but take a little work 
to show): If G is a connected graph in which every vertex has even degree then G has an 
Eulerian circuit. If G is a connected graph with all vertices but two of even degree, then G 
has an Eulerian trail joining the two vertices of odd degree. OJ 


Here is a precise definition of Eulerian trail and circuit. 


Definition 11 (Eulerian trail, circuit) Let G = (V,E,@) be a connected graph. If 
there is a trail with edge sequence (€1,€2,...,€) in G which uses each edge in FE, then 
(€1,€2,---, ex) is called an Eulerian trail. If there is a circuit C = (V’, E’,¢@’) in G with 
E’ = E, then C is called an Eulerian circuit. 


The ideas of a directed Eulerian circuit and directed Eulerian trail for directed graphs are 
defined in exactly the same manner. 


An Eulerian circuit in a graph contains every edge of that graph. What about a cycle 
that contains every vertex but not necessarily every edge? Our next example discusses that 
issue. 


Example 13 (Hamiltonian cycle) Start with a graph G’ = (V,E’,¢’) that is a 
cycle and then add additional edges, without adding any new vertices, to obtain a graph 
G = (V,E,¢). As an example, consider 


1 


where the first graph G’ = (V, E’, @’) is the cycle induced by the edges {a, b,c, d,e, f}. The 
second graph G = (V, E, ¢) is obtained from G’ by adding edges g,h,i and 7. A graph that 
can be constructed from such a two-step process is called a Hamiltonian graph. The cycle 
G’ is called a Hamiltonian cycle for G. 


Definition 12 (Hamiltonian cycle, Hamiltonian graph) A cycle in a graph G = 
(V, E,o) is a Hamiltonian cycle for G if every element of V is a vertex of the cycle. A 
graph G = (V,E,¢) is Hamiltonian if it has a subgraph that is a Hamiltonian cycle for G. 


Notice that an Eulerian circuit uses every edge exactly once and a Hamiltonian cycle uses 
every vertex exactly once. We gave a very simple characterization of when a graph has 
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an Eulerian circuit (in terms of degrees of vertices). There is no simple characterization 
of when a graph has a Hamiltonian cycle. On the contrary, the issue of whether or not a 
graph has a Hamiltonian cycle is notoriously difficult to resolve in general. 


As we already mentioned, connectivity issues in digraphs are much more difficult than 
in graphs. A digraph is strongly connected if, for every two vertices v and w there is a 
directed path from v to w. From any digraph D, we can construct a simple graph S(D) on 
the same set of vertices by letting {v,w} be an edge of S(D) if and only if at least one of 
(u,v) and (v, u) is an edge of D. You should be able to show that if D is strongly connected 
then S(D) is connected. The converse is false. As an example, take D = (V, F) to be the 
simple digraph where V = {1,2} and EF = {(1,2)}. There is no directed path from 2 to 1, 
but clearly S(D) = (V,{{1,2}}) is connected. 


Other issues for digraphs analogous to those for graphs work out pretty well, but are 
more technical. An example is the notion of degree for vertices. For any subset U of the 
vertices V of a directed graph D = (V,E), define di,(U) to be the number of edges of e 
of D with ¢(e) of the form (w,u) where u € U and w ¢ U. Define dou(U) similarly. If 
U = {v} consists of just one vertex, di,(U) is usually written simply as dj,(v) rather than 
the more technically correct din({v}). Similarly, we write douz(v). You should compute 
din(v) and dout(v) for the vertices v of the graph of Example 9. You should be able to show 
that 5) din(v) = > dout(v) = |E|, where the sums range over all v € V. See the Exercises 
for Section 1 for the idea. 


Example 14 (Bicomponents of graphs) Let G = (V,E,¢) bea graph. For e, f € E 
write e ~ f if either e = f or there is a cycle of G that contains both e and f. We claim 
that this is an equivalence relation. The reflexive and symmetric parts are easy. Suppose 
thate ~ f ~ g. If e = g, then e ~ g, so suppose that e 4 g. Let ¢(e) = {v1, v2}. Let 
C(e, f) be the cycle containing e and f and C(f,g) the cycle containing f and g. In C(e, f) 
there is a path P, from v1 to vg that does not contain e. Let x and y 4 =x be the first 
and last vertices on P, that lie on the cycle containing f and g. We know that there must 
be such points because the edge f is on P;. Let P, be the path in C(e, f) from y to x 
containing e. In C(f,g) there is a path P3 from x to y containing g. We claim that P, 
followed by P3 defines a cycle containing e and g. 


Some examples may help. Consider a graph that consists of two disjoint cycles that 
are joined by an edge. There are three bicomponents — each cycle and the edge joining 
them. Now consider three cycles that are disjoint except for one vertex that belongs to all 
three of them. Again there are three bicomponents — each of the cycles. 


Since ~ is an equivalence relation on the edges of G, it partitions them. If the 
partition has only one block, then we say that G is a biconnected graph. If E" is a block 
in the partition, the subgraph of G induced by E’ is called a bicomponent of G. Note that 
the bicomponents of G are not necessarily disjoint: Bicomponents may have vertices in 
common (but never edges). There are four bicomponents in the following graph. Two are 
the cycles, one is the edge {C,O}, and the fourth consists of all of the rest of the edges. 
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Exercises for Section 2 


2.1. A graph G = (V,E) is called bipartite if V can be partitioned into two sets C 
and S such that each edge has one vertex in C and one vertex in S. As a specific 
example, let C’ be the set of courses at the university and S the set of students. 
Let V =CUS and let {s,c} € F if and only if student s is enrolled in course c. 
(a) Prove that G = (V, F) is a simple graph. 


(b) Prove that every cycle of G has an even number of edges. 


2.2. In each of the following graphs, find the longest trail (most edges) and longest 
circuit. If the graph has an Eulerian circuit or trail, say so. 


2.3. For each of the following graphs G = (V, FE, @), find a cycle in G of maximum length. 
State whether or not the graph is Hamiltonian. 
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2.4. 


2.5. 


2.6. 


We are interested in the number of simple digraphs with V = n 
(a) Find the number of them. 

(b) Find the number of them with no loops. 

(c) In both cases, find the number of them with exactly q edges. 


An oriented simple graph is a simple graph which has been converted to a digraph 
by assigning an orientation to each edge. The orientation of {u,v} can be thought 
of as a mapping of it to either (u,v) or (v,u). 


(a) Give an example of a simple digraph that has no loops but is not an oriented 
simple graph 


(b) Find the number of oriented simple digraphs. 
(c) Find the number of them with exactly q edges. 


A binary relation R on S is an order relation if it is reflexive, antisymmetric, and 
transitive. R is antisymmetric if for all (x,y) € R with x F y, (y,x) ¢ R. 
Given an order relation R, the covering relation H of R consists of all (x, z) € R, 
x # z, such that there is no y, distinct from both x and z, such that (2, y) € R and 
(y,z) € R. A pictorial representation of the covering relation as a directed graph 
is called a “Hasse diagram” of H. 


(a) Show that the divides relation on 
S = {2,3,4,5,6,7,8,9, 10, 11, 12, 13, 14, 15, 16} 


is an order relation. By definition, (x,y) is in the divides relation on S is x 
is a factor of y. Thus, (4,12) is in the divides relation. z|y is the standard 
notation for x is a factor of y. 


(b) Find and draw a picture of the directed graph of the covering relation of the 
divides relation. 
Hint: You must find all pairs (x, z) € S x S such that z|z but there does not 
exist any y, x < y < z, such that zly and y|z. 
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Section 3: Trees 


Trees play an important role in a variety of algorithms. We have used decision trees 
to enhance our understanding of recursion. In this section, we define trees precisely and 
look at some of their properties. 


Definition 13 (Tree) If G is a connected graph without any cycles then G is called a 
tree. (If |V| =1, then G is connected and hence is a tree.) A tree is also called a free tree. 


The graph of Example 2 is connected but is not a tree. It has many cycles, including 
({A, B,C}, {a, e,c}). The subgraph of this graph induced by the edges {a, e, g} is a tree. If 
G is a tree, then ¢ is an injection since if e; # e2 and ¢(e1) = (eg), then {e1, e2} induces 
a cycle. In other words, any graph with parallel edges is not as tree. Likewise, a loop is a 
cycle, so a tree has no loops. Thus, we can think of a tree as a simple graph when we are 
not interested in names of the edges. 


Since the notion of a tree is so important, it will be useful to have some equivalent 
definitions of a tree. We state them as a theorem 


Theorem 4 (Alternative definitions of a tree) IfG is a connected graph, the following 
are equivalent. 


(a) G is a tree. 

(b) G has no cycles. 

(c) For every pair of vertices u # v in G, there is exactly one path from u to v. 
(d) Removing any edge from G gives a graph which is not connected. 


(e) The number of vertices of G is one more than the number of edges of G. 


Proof: We are given that G is connected, thus, by the definition of a tree, (a) and (b) 
are equivalent. 


Theorem 3 can be used to prove that (b) implies (c). We leave that as an exercise 
(show not (c) implies not (b)). 


If {u,v} is an edge, it follows from (c) that the edge is the only path from u to v and 
so removing it disconnects the graph. Hence (c) implies (d). 


We leave it as an exercise to prove that (d) implies (b) (show not (b) implies not (d)). 


Thus far, we have shown (a) and (b) are equivalent, and we have shown that (b) implies 
(c) implies (d) implies (b), so (a), (b), (c), and (d) are all equivalent. All that remains is 
to include (e) in this equivalence class of statements. To do this, all we have to do is show 
that (e) implies any of the equivalent statements (a), (b), (c), and (d) and, conversely, some 
one of (a), (b), (c), and (d) implies (e). We shall show that (b) implies (e) and that (e) 
implies (a). 
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We first show that (b) implies (e). We will use induction on the number of vertices of 
G. If G has one vertex, it has no edges and (e) is satisfied. Otherwise, we claim that G has 
a vertex u of degree 1; that is, it lies on only one edge {u,w}. We prove this claim shortly. 
Remove u and {u,v} to obtain a graph H with one less edge and one less vertex. Since G 
is connected and has no cycles, the same is true of H. By the induction hypothesis, H has 
one less edge than vertex. Since we got from G to H by removing one vertex and one edge, 
G must also have one less edge than vertex. By induction, the proof is done. It remains to 
prove the existence of u. Suppose no such wu exists; that is, suppose that each vertex lies 
on at least two edges. We will derive a contradiction. Start at any vertex v, of G leave v1 
by some edge e; to reach another vertex vg. Leave vg by some edge e2 different from the 
edge used to reach vg. Continue with this process. Since each vertex lies on at least two 
edges, the process never stops. Hence we eventually repeat a vertex, say 


U1, €1,U2,---5Uk, €ky +++ 5 Un; En; Un+1 = Vk: 


The edges ex,...,@n form a cycle, which is a contradiction. 


Having shown that (b) implies (e), we now show that (e) implies (a). We use the 
contrapositive and show that not (a) implies not (e). Thus we assume G is not a tree. 
Hence, by (d) we can remove an edge from G to get a new graph which is still connected. 
If this is not a tree, repeat the process and keep doing so until we reach a tree T’. For a 
tree T, we trivially satisfy (a) which implies (b) and (b) implies (e). Thus, the number of 
vertices is now one more than the number of edges in the graph T. Since, in going from G 
to T’, we removed edges from G but did not remove vertices, G must have at least as many 
edges as vertices. This shows not (a) implies not (e) and completes the proof. 0 


Definition 14 (Forest) A forest is a graph all of whose connected components are trees. 
In particular, a forest with one component is a tree. (Connected components were defined 
following Definition 10.) 


Example 15 (A relation for forests) Suppose a forest has v vertices, e edges and c 
(connected) components. What values are possible for the triple of numbers (v,e,c)? It 
might seem at first that almost anything is possible, but this is not so. In fact v —c =e 
because of Theorem 4(e). Why? Let the forest consist of trees T1,...,7~ and let the triples 
for T; be (v4, e;,¢;). Since a tree is connected, c; = 1. By the theorem, e; = v; — 1. Since 
V=Uyp +++ +e ande=e;, +...+€,. we have 


e = (vy, —1)4+ (w.-1)4+--- +(e. -1) = (ui t-:- +0.) -—¢ = v-«. 


Suppose a forest has e = 12 and v = 15. We know immediately that it must be made 
up of three trees because c = v —e = 15 — 12. 


Suppose we know that a graph G = (V, E,¢) has v = 15 and c = 3, what is the fewest 
edges it could have? For each component of G, we can remove edges one by one until we 
cannot remove any more without breaking the component into two components. At this 
point, we are left with each component a tree. Thus we are left with a forest of c = 3 trees 
that still has v = 15 vertices. By our relation v — c = e, this forest has 12 edges. Since we 
may have removed edges from the original graph to get to this forest, the original graph 
has at least 12 edges. 
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What is the maximum number of edges that a graph G = (V,E,¢) with v = 15 and 
c = 3 could have? Since we allow multiple edges, a graph could have an arbitrarily large 
number of edges for a fixed v and c — if e is an edge with ¢(e) = {u,v}, add in as many 
edges e; with ¢(e;) = {u,v} as you wish. Hence we will have to insist that G be a simple 
graph. 


What is the maximum number of edges that a simple graph G with v= 15 andc=3 
could have? This is a bit trickier. Let’s start with a graph where c is not specified. The 
edges in a simple graph are a subset of P2(V) and since P2(V) has (3) elements, a simple 
graph with v vertices has at most (5) edges. 


Now let’s return to the case when we know there must be three components in our 
simple graph. Suppose the number of vertices in the components are v1, v2 and v3. Since 
there are no edges between components, we can look at each component by itself. Using 
the result in the previous paragraph for each component, the maximum number of possible 
edges is (3) ++ (3) + (Fs We don’t know v1, v2,v3. All we know is that they are strictly 
positive integers that sum to v. It turns out that the maximum occurs when one of 1; is 
as large as possible and the others equal 1, but the proof is beyond this course. Thus the 
answer is (ere which in our case is Cs = 78. In general, if there were c components, 
c — 1 components would have one vertex each and the remaining component would have 


v —(c—1) =v+1-—c vertices. Hence there can be no more than (ae) edges. 
Reviewing what we’ve done, we see: 
e There is no graph G = (V,E,¢) with v—c>e. 


e If v—c =e, the graph is a forest of c trees and any such forest will do as an 
example. 


e Ifv—c<e, there are many examples, none of which are forests. 


e Ifv—c<eand we have a simple graph, then we must have e < (are oO 


Recall that decision trees, as we have used them, have some special properties. First, 
they have a starting point. Second, the edges (decisions) out of each vertex are ordered. 
We now formalize these concepts. 


Definition 15 (Rooted graph) A pair (G,v), consisting of a graph G = (V, E,¢) and 
a specified vertex v, is called a rooted graph with root v. 


Definition 16 (Parent, child, sibling and leaf) Let (T,r) be a rooted tree. If w 
is any vertex other than r, let r = v9,VU1,.--,Uk,UgR+1 = Ww, be the list of vertices on the 
unique path from r tow. We call vg the parent of w and call w a child of vz. Parents 
and children are also called fathers and sons. Vertices with the same parent are siblings. 
A vertex with no children is a leaf. All other vertices are internal vertices of the tree. 


Definition 17 (Rooted plane tree) Let (T,r) be a rooted tree. For each vertex, order 
the children of the vertex. The result is a rooted plane tree, which we abbreviate to RP- 
tree. RP-trees are also called ordered trees. An RP-tree is also called, in certain contexts, 
a decision tree, and, when there is no chance of misunderstanding, simply a tree. 
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Since almost all trees in computer science are rooted and plane, computer scientists usually 
call a rooted plane tree simply a tree. It’s important to know what people mean! 


Example 16 (A rooted plane tree) Below is a picture of a rooted plane tree T = 
(V, E,¢). In this case V = 11 and E = {a,...,j}. There are no parallel edges or loops, 
as required by the definition of a RP-tree. The root is r = 1. For each vertex, there is a 
unique path from the root to that vertex. Since ¢ is an injection, once ¢@ has been defined 
(as it is in the picture), that unique path can be specified by the vertex sequence alone. 
Thus, the path from the root to 6 is (1,3,6). The path from the root to 9 is (1,3,6,9). 
Sometimes computer scientists refer to the path from the root to a vertex v as the “stack” 
of v. 


In the tree below, the vertex 6 is the parent of the vertex 9. The vertices 8, 9, 10, and 
11 are the children of 6 and, they are siblings of each other. The leaves of the tree are 4, 5, 
7, 8, 9, 10, and 11. All other vertices (including the root) are internal vertices of the tree. 


Remember, an RP-tree is a tree with added properties. Therefore, it must satisfy (a) 
through (e) of Theorem 4. In particular, T has no cycles. Also, there is a unique path 
between any two vertices (e.g., the path from 5 to 8 is (5,2,1,3,6,8)). Removing any edge 
gives a graph which is not connected (e.g., removing j disconnects T into a tree with 10 
vertices and a tree with 1 vertex; removing e disconnects JT’ into a tree with 6 vertices and 
one with 5 vertices). Finally, the number of edges (10) is one less than the number of 
vertices. 


Example 17 (Traversing a rooted plane tree) Just as in the case of decision trees, 
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one can define the notion of depth first traversals of a RP-tree. 


Imagine going around (“traversing”) the above RP-tree following arrows. Start at the root, 
1, go down edge a to vertex 2, etc. Here is the sequence of vertices as encountered in this 
process: 1, 2, 4, 2, 5, 2, 1, 3, 6, 8, 6, 9, 6, 10, 6, 11, 6, 3, 7, 3, 1. This sequence of 
vertices is called the depth first vertex sequence, DF V(T), of the RP-tree T. The number 
of times each vertex appears in DF V(T) is one plus the number of children of that vertex. 
For edges, the corresponding sequence is a, c, c, d, d, a, b, e, g, g, h, h, i, i, j, j, e, f, f, 6. 
This sequence is the depth first edge sequence, DFE(T), of the tree. Every edge appears 
exactly twice in DFE(T). If the vertices of the RP-tree are read left to right, top to bottom, 
we obtain the sequence 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11. This is called the breadth first 
verter sequence, BFV(T). Similarly, the breadth first edge sequence, BFE(T), is a, }, c, 
d, é, f g; h, a, j. 


The sequences BFV(T) and BFE(T) are linear orderings of the vertices and edges 
of the RP-tree T (i.e., each vertex or edge appears exactly once in the sequence). We 
also associate linear orderings with DF V(T) called the preorder sequence of vertices of T, 
PREV(T), and the postorder sequence of vertices of T, POSV(T). 


PREV(T) = 1,2,4,5,3,6,8,9, 10, 11,7 is the sequence of first occurrences of the ver- 
tices of T in DFV(T). 


POSV(T) = 4,5, 2,8,9, 10, 11, 6,7, 3, 1 is the sequence of last occurrences of the vertices 
of T in DFV(T). 


Notice that the order in which the leaves of T appear, 4, 5, 8, 9, 10, 11, is the same in 
both PREV(T) and POSV(T). Can you see why this is always true for any tree? OJ 


*Example 18 (The number of labeled trees) How many n-vertex labeled trees are 
there? In other words, count the number of trees with vertex set V = n. The answer has 
been obtained in a variety of ways. We will do it by establishing a correspondence between 
trees and functions by using digraphs. 
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Suppose f is a function from V to V. We can represent this as a simple digraph (V, F) 
where the edges are {(v, f(v)) |v € V}. The function 


1 2 3 4 5 6 7 8 9 10 Ii 
1 10 9 28 22 5 1 6 Ii 


corresponds to the directed graph 
e3 4 7 
| e o) 
9 ae se 14 
| AR OO 
J iN 
>} e ——> e 
10 6 


Such graphs are called functional digraphs. You should be able to convince yourself that 
a functional digraph consists of cycles (including loops) with each vertex on a cycle being 
the root of a tree of noncyclic edges. The edges of the trees are directed toward the roots. 
In the previous figure, 


e 1 is the root of the tree with vertex set {1,3,9}, 

e 2 is the root of the tree with vertex set {2,4,7}, 

e 5 is the root of the tree with vertex set {5}, 

e 6 is the root of the tree with vertex set {6}, 

e 8 is the root of the tree with vertex set {8}, 

e 10 is the root of the tree with vertex set {10} and 
e 11 is the root of the tree with vertex set {11}. 


In a tree, there is a unique path from the vertex 1 to the vertex n. Remove all the 
edges on the path and list the vertices on the path, excluding 1 and n, in the order they 
are encountered. Interpret this list as a permutation in 1 line form. Draw the functional 
digraph for the cycle form, adding the cycles (1) and (n). Add the trees that are attached 
to each of the cycle vertices, directing their edges toward the cycle vertices. Consider the 


following figure. 
3 
lL # 3 
9 
AY ae 
10 8 2 5 6 11 


; 
2 5 6 8 10 
10 8 2 5 6 
cycle form is (2,10,6)(5,8). When we add the two cycles (1) and (11) to this, draw the 
directed graph, and attach the directed trees, we obtain the functional digraph pictured 
earlier. 


The one line form is 10,8,2,5,6. In two line form it is . Thus the 


We leave it to you to convince yourself that this gives us a one-to-one correspondence 
between trees with V = n and functions f :n + n with f(1) = 1 and f(n) =n. In creating 
such a function, there are n choices for each of f(2),...,f(n — 1). Thus there are n”~? 
such functions and hence n”~? trees. O 
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Spanning Trees 


Trees are not only important objects of study per se, but are important as special subgraphs 
of general graphs. A spanning tree is one such subgraph. For notational simplicity, we shall 
restrict ourselves to simple graphs, G = (V, £), in the following discussion. The ideas we 
discuss extend easily to graphs G = (V, E,¢), even allowing loops. 


Definition 18 (Spanning tree) A spanning tree of a (simple) graph G = (V, E) is a 
subgraph T = (V, E’) which is a tree and has the same set of vertices as G. 


Example 19 (Connected graphs and spanning trees) Since a tree is connected, a 
graph with a spanning tree must be connected. On the other hand, it is not hard to see that 
every connected graph has a spanning tree. Any simple graph G = (V, £) has a subgraph 
that is a tree, T’ = (V’, E’). Take V’ = {v} to be one vertex and E’ empty. Suppose that 
T’ = (V’,E’) is the largest such “subtree.” If T’ is not a spanning tree then there is a 
vertex w of G that is not a vertex of T’. If G is connected, choose a vertex u in T’ and a 
path w = 71,%9,...,e, = u from w to u. Let 7, 1 < 7 < k, be the first integer such that 
x; is a vertex of T’. Then adding the edge {x;-1,2,;} and the vertex xj; to T’ creates a 
subtree T of G that is larger than T’, a contradiction of the maximality of T’. We have, in 
fact, shown that a graph is connected if and only if every maximal subtree is a spanning 
tree. Thus we have: A graph is connected if and only if it has a spanning tree. It follows 
that, if we had an algorithm that was guaranteed to find a spanning tree whenever such a 
tree exists, then this algorithm could be used to decide if a graph is connected. O 


Example 20 (Minimum spanning trees) Suppose we wish to install “lines” to link 
various sites together. A site may be a computer installation, a town, or a factory. A line 
may be a digital communication channel, a rail line or, a shipping route for supplies. We’ll 
assume that 


(a) a line operates in both directions; 
(b) it must be possible to get from any site to any other site using lines; 


(c) each possible line has a cost (rental rate, construction cost, or shipping cost) 
independent of each other line’s cost; 


(d) we want to choose lines to minimize the total cost. 


We can think of the sites as vertices V in a (simple) graph, the possible lines as edges F 
and the costs as a function from the edges to the positive real numbers. Because of (a) 
and (b), the lines E’ C E we actually choose will be such that T = (V, E’) is connected. 
Because of (d), T will be a spanning tree since, if it had more edges, we could delete some, 
but if we delete any from a tree it will not be connected by Theorem 4. QJ 


We now formalize these ideas in a definition: 


Definition 19 (Weights in a graph) Let G = (V,E) bea simple graph and let be 
a function from E to the positive real numbers. We call X(e) the weight of the edge e. If 
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H = (V’,E’) is a subgraph of G, then (fH), the weight of H, is the sum of X(e’) over all 
e € E’. 


A minimum weight spanning tree for a connected graph G is a spanning tree such that 
A(T) < A(T’) whenever T’ is another spanning tree. 


How can we find a minimum weight spanning tree 7’? One approach is to construct 
T by adding an edge at a time in a greedy way. Since we want to minimize the weight, 
“oreedy” means keeping the weight of each edge we add as low as possible. Here’s such an 
algorithm. 


Theorem 5 (Minimum weight spanning tree: Prim’s algorithm) Let G = (V, E) 
be a simple graph with edge weights given by X. If the algorithm stops with V'’ 4 V, G 
has no spanning tree; otherwise, (V, E’) is a minimum weight spanning tree for G. 


1. Start: Let E’ = 0 and let V’ = {vo} where vo is any vertex in V. 


2. Possible Edges: Let F C E be those edges f = {x,y} with one vertex in V’ and 
one vertex not in V’. If F =Q, stop. 


3. Choose Edge Greedily: Let f = {x,y} be such that \(f) is a minimum over 
all f € F. Replace V’ with V' U {x,y} and E’ with E’ U{f}. Go to Step 2. 


Proof: We begin with the first part; i.e, if the algorithm stops with V’ 4 V, then G 
has no spanning tree. The argument is similar to that used in Example 19. Suppose that 
V' £V and that there is a spanning tree. We will prove that the algorithm does not stop 
at V’. Choose u € V—V' and v € V’. Since G is connected, there must be a path from u 
to v. Each vertex on the path is either in V’ or not. Since u ¢ V’ and v € V’, there must 
be an edge f on the path with one end in V’ and one end not in V’. But then f € F and 
so the algorithm does not stop at V’. 


We now prove that, if G has a spanning tree, then (V, E’) is a minimum weight spanning 
tree. One way to do this is by induction: We will prove that at each step there is a minimum 
weight spanning tree of G that contains E’. 


The starting case for the induction is the first step in the algorithm; i.e., E’ = @. Since 
G has a spanning tree, it must have a minimum weight spanning tree. The edges of this 
tree obviously contain the empty set, which is what E’ equals at the start. 


We now carry out the inductive step of the proof. Let V’ and E’ be the values going 
into Step 3 and let f = {x,y} be the edge chosen there. By the induction hypothesis, there 
is a minimum weight spanning tree T of G that contains the edges E’. If it also contains 
the edge f, we are done. Suppose it does not contain f. We will prove that we can replace 
an edge in the minimum weight tree with f and still achieve minimum weight. 


Since T' contains all the vertices of G, it contains x and y and, also, some path P from 
x to y. Suppose x € V’ and y ¢ V’, this path must contain an edge e = {u,v} with u € V’ 
and v ¢ V’. We now prove that removing e from T and then adding f to T will still give 
a minimum spanning tree. 


By the definition of F in Step 2, e € F and so, by the definition of f, A(e) > A(f). 
Thus the weight of the tree does not increase. If we show that the result is still a tree, this 
will complete the proof. 
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The path P together with the edge f forms a cycle in G. Removing e from P and 
adding f still allows us to reach every vertex in P and so the altered tree is still connected. 
It is also still a tree because it contains no cycles — adding f created only one cycle and 
removing e destroyed it. This completes the proof that the algorithm is correct. 0 


The algorithm for finding a minimum weight spanning tree that we have just proved 
is sometimes referred to as Prim’s Algorithm. — A variation on this algorithm, proved in 
a similar manner, is called Kruskal’s algorithm. In Kruskal’s algorithm, step 2 of Prim’s 
algorithm is changed to 


2’. Possible Edges: Let F C E be those edges f = {x,y} where x and y do not 
belong to the same component of (V, E’). If F = 0, stop. 


Intuitively, f ¢ F if f forms a cycle with any collection of edges from EF’. Otherwise, f € F. 
This extra freedom is sometimes convenient. Our next example gives much less freedom in 
choosing new edges to add to the spanning tree, but produces a type of spanning tree that 
is useful in many algorithms applicable to computer science. 


Example 21 (Algorithm for lineal or depth-first spanning trees) We start with 
a rooted simple graph G = (V,F) with vp as root. The algorithmic process constructs a 
spanning tree rooted at vo. It follows the same general form as Theorem 5. The weights, 
if there, are ignored. 


1. Start: Let E’ = 0 and let V’ = {vo} where vp is the root of G. Let T’ = (V’, E’) 
be the starting subtree, rooted at vo. 


2. Possible New Edge: Let v be the last vertex added to V’ where 7” = (V’, E’) 
is the subtree thus far constructed, with root vp. Let x be the first vertex on the 
unique path from v to vp for which there is an edge f = {x,y} with x € V’ and 
y € V’. If there is no such z, stop. 


3. Add Edge: Replace V’ with V’U{y} and E” with E’U{ f} to obtain T’ = (V’, E’) 
as the new subtree thus far constructed, with root vp. (Note: y is now the last 
vertex added to V’.) Go to Step 2. 


Here is an example. We are going to find a lineal spanning tree for the graph below, 
root a. The result is shown on the right where the original vertices have been replaced by 
the order in which they have been added to the “tree thus far constructed” in the algorithm. 


When there is a choice, we choose the left or upward vertex. For example, at the start, 
when 8, d, e and g are all allowed, we choose b. When vertex 2 was added, the path to the 
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root was (2,1,0). We went along this path towards the root and found that at 1, a new 
edge to 3 could be added. Now the path to the root became (3, 1,0) and we had to go all 
of the way to 0 to add a new edge (the edge to 4). You should go through the rest of the 
algorithm. Although there are some choices, the basic rule of step 2 of the algorithm must 
always be followed. 


There are two extremely important properties that this algorithm has 


1. When the rooted spanning tree T’ for G has been constructed, there may be edges 
of G not in the spanning tree. In the above picture, there are three such edges, 
indicated by dashed lines. If {x, y} is such an edge, then either x lies on the path 
from y to the root or the other way around. For example, the edge {4,7} in the 
example has 4 on the path from 7 to the root 0. This is the “lineal” property from 
which the spanning trees of this class get their name. 


2. If, when the rooted spanning tree T’ has been constructed, the vertices of T’ are 
labeled in the order added by the algorithm AND the children of each vertex of 
T are ordered by the same numbering, then an RP-tree is the result. For this RP 
tree, the numbers on the vertices correspond to preorder, PREV(T), of vertices 
on this tree (starting with the root having value 0). Check this out for the above 
example. 


We will not prove that the algorithm we have presented has properties 1 and 2. We 
leave it to you to study the example, construct other examples, and come to an intuitive 
understanding of these properties. OJ 


Property 1 in the preceding example is the basis for the formal definition of a lineal 
spanning tree: 


Definition 20 (Lineal or depth-first spanning tree) Let x and y be two vertices in 
a rooted tree with root r. If x is on the path connecting r to y, we say that y is a descendant 
of x. (In particular, all vertices are descendants of r.) If one of u and v is a descendant of 
the other, we say that {u,v} is a lineal pair. A lineal spanning tree or depth-first spanning 
tree of a connected graph G = (V, FE) is a rooted spanning tree of G such that each edge 
{u,v} of G is a lineal pair. 


In our example, vertices {6,7} are not a lineal pair relative to the rooted tree con- 
structed. But {4,7}, which is an edge of G, is a lineal pair. Trivially, the vertices of any 
edge of the tree T form a lineal pair. 


We close this section by proving a theorem using lineal spanning trees. We don’t 
“overexplain” this theorem to encourage you to think about the properties of lineal spanning 
trees that make the proof much simpler than what we might have come up with without 
lineal spanning trees. Recall that a graph G = (V,£) is called bipartite if V can be 
partitioned into two sets C and S such that each edge has one vertex in C' and one vertex 
in S (Exercises for Section 2). 


Theorem 6 (Bipartite and cycle lengths) Let G =(V,E) bea simple graph. G is 
bipartite if and only if every cycle has even length. 
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Proof: If G has a cycle of odd length, label each vertex with the block of some proposed 
bipartite partition {C,S}. For example, if , (71,272,273) are the vertices, in some order, 
of a cycle of length three, then the block labels (start with C) would be (C,S,C) This 
would mean that the edge {x1,23} would have both vertices in block C. This violates the 
definition of a bipartite graph. Since this problem happens for any cycle of odd length, a 
bipartite graph can never contain a cycle of odd length. 


To prove the converse, we must show that if every cycle of G has even length, then G 
is bipartite. Suppose every cycle of G has even length. Choose a vertex vp as root of G 
and construct a lineal spanning tree T’ for G with root vo. Label the root vo of T with C, 
all vertices of T’ of distance 1 from vp with S,, all of distance 2 from vg with C, etc. Put 
vertices labeled C' into block C of a partition {C,S} of V, put all other vertices into block 
S. If f = {x,y} is an edge of T then x and y are in different blocks of the partition {C,S} 
by construction. If f = {x,y} is an edge of G not in T then the two facts (1) T is lineal and 
(2) every cycle has even length, imply that x and y are in different blocks of the partition 
{C, 8}. This completes the proof. O 


Exercises for Section 3 


3.1. In this exercise, we study how counting edges and vertices in a graph can establish 
that cycles exist. For parts (a) and (b), let G = (V,E,@) be a graph with loops 
allowed. 


(a) Using induction on n, prove: 
If n > 0, G is connected and G has v vertices and v + n edges, then G has at 
least n + 1 cycles. 


(b) Prove that, if G has v vertices, e edges and c components, then G has at least 
c+e-—v cycles. 
Hint: Use (a) for each component. 


(c) Show that (a) is best possible, even for simple graphs. In other words, for each 
n construct a simple graph that has n more edges than vertices but has only 
n+ 1 cycles. 


3.2. Let T = (V,£) be a tree and let d(v) be the degree of a vertex 
(a) Prove that \v ey (2—d(v)) = 2. 


(b) Prove that, if T has a vertex of degree m > 2, then it has at least m vertices 
of degree 1. 


(c) Give an example for all m > 2 of a tree with a vertex of degree m and only m 
leaves. 


3.3. Give an example of a graph that satisfies the specified condition or show that no 
such graph exists. 
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3.4. 


3.5. 


(a) A tree with six vertices and six edges 


(b) A tree with three or more vertices, two vertices of degree one and all the other 
vertices with degree three or more. 


Cc 


d 


) A disconnected graph with 10 vertices and 8 edges. 
) 

e) A tree with 6 vertices and the sum of the degrees of all vertices is 12. 
) 
) 


( 
(d) A disconnected graph with 12 vertices and 11 edges and no cycle. 


( 
(£ 


A connected graph with 6 edges, 4 vertices, and exactly 2 cycles. 


(g) A graph with 6 vertices, 6 edges and no cycles. 


The height of a rooted tree is the maximum height of any leaf. The length of the 
unique path from a leaf of the tree to the root is, by definition, the height of that 
leaf. A rooted tree in which each non-leaf vertex has at most two children is called 
a binary tree. If each non-leaf vertex has exactly two children, the tree is called a 
full binary tree. 


(a) If a binary tree has / leaves and height h prove that | < 2”. (Taking logarithms 
gives logs (1) < h.) 


(b) A binary tree has / leaves. What can you say about the maximum value of h? 


Cc 


d 


(e) Given a binary tree of | leaves, what is the minimum height h? 


) 
(c) Given a full binary tree with / leaves, what is the maximum height h? 
(d) Given a full binary tree with / leaves, what is the minimum height h? 
) 

In each of the following cases, state whether or not such a tree is possible. 
(a) A binary tree with 35 leaves and height 100. 

(b) A full binary tree with 21 leaves and height 21. 

(c) A binary tree with 33 leaves and height 5. 
) 


(d) A rooted tree of height 5 where every internal vertex has 3 children and there 
are 365 vertices. 


3.6. For each of the following graphs: 


A B A B A B 


D Cc D Cc D Cc 
(1) (2) (3) 


(a) Find all spanning trees. 
(b) Find all spanning trees up to isomorphism. 


(c) Find all depth-first spanning trees rooted at A. 
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(d) Find all depth-first spanning trees rooted at B. 


3.7. For each of the following graphs: 


Aqg—* 98 ag—2—9B Aq eB 

Vt . 
2 

D fe D e D c 
(1) (2) (3) 


(a) Find all minimum spanning trees. 
(b) Find all minimum spanning trees up to isomorphism. 


(c) Among all depth-first spanning trees rooted at A, find those of minimum 
weight. 


(d) Among all depth-first spanning trees rooted at B, find those of minimum 
weight. 


3.8. In the following graph, the edges are weighted either 1, 2, 3, or 4. 


Referring to Theorem 5 and the discussion following of Kruskal’s algorithm: 
(a) Find a minimum spanning tree using Prim’s algorithm 
(b) Find a minimum spanning tree using Kruskal’s algorithm. 


(c) Find a depth-first spanning tree rooted at K. 


Section 4: Rates of Growth and Analysis of Algorithms 


Suppose we have an algorithm and someone asks us “How good is it?” To answer that 
question, we need to know what they mean. They might mean “Is it correct?” or “Is it 
understandable?” or “Is it easy to program?” We won’t deal with any of these. 


They also might mean “How fast is it?” or “How much space does it need?” These 
two questions can be studied by similar methods, so we'll just focus on speed. Even 
now, the question is not precise enough. Does the person mean “How fast is it on this 
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particular problem and this particular machine using this particular code and this particular 
compiler?” We could answer this simply by running the program! Unfortunately, that 
doesn’t tell us what would happen with other machines or with other problems that the 
algorithm is designed to handle. 


We would like to answer a question such as “How fast is Algorithm 1 for finding a 
spanning tree?” in such a way that we can compare that answer to “How fast is Algorithm 
2 for finding a spanning tree?” and obtain something that is not machine or problem 
dependent. At first, this may sound like an impossible goal. To some extent it is; however, 
quite a bit can be said. 


How do we achieve machine independence? We think in terms of simple machine 
operations such as multiplication, fetching from memory and so on. If one algorithm 
uses fewer of these than another, it should be faster. Those of you familiar with computer 
instruction timing will object that different basic machine operations take different amounts 
of time. That’s true, but the times are not wildly different. Thus, if one algorithm uses a 
lot fewer operations than another, it should be faster. It should be clear from this that we 
can be a bit sloppy about what we call an operation; for example, we might call something 
like x = a+) one operation. On the other hand, we can’t be so sloppy that we call 
x=a,+--:+a, one operation if n is something that can be arbitrarily large. 


Example 22 (Finding the maximum) Let’s look at how long it takes to find the 
maximum of a list of n integers where we know nothing about the order they are in or how 
big the integers are. Let a1,...,a@, be the list of integers. Here’s our algorithm for finding 


the maximum. 
max = ay, 


For 1 = 2,...,n 
If a; > maz, then max = aj. 
End for 

Return maz 


Being sloppy, we could say that the entire comparison and replacement in the “If’ takes 
an operation and so does the stepping of the index 7. Since this is done n — 1 times, we 
get 2n — 2 operations. There are some setup and return operations, say s, giving a total of 
2n — 2+-s operations. Since all this is rather sloppy all we can really say is that for large 
nm and actual code on an actual machine, the procedure will take about Cn “ticks” of the 
machine’s clock. Since we can’t determine C’ by our methods, it will be helpful to have 
a notation that ignores it. We use O(f(n)) to designate any function that behaves like a 
constant times f(n) for arbitrarily large n. Thus we would say that the “If” takes time 
O(n) and the setup and return takes time @(1). Thus the total time is O(n) + (1). Since 
n is much bigger than 1 for large n, the total time is O(n). O 


We need to define © more precisely and list its most important properties. We will 
also find it useful to define O, read “big oh.” 


Definition 21 (Notation for 0 and O) Let f, g and h be functions from the positive 
integers to the nonnegative real numbers. We say that g(n) is O(f(n)) if there exist positive 
constants A and B such that Af(n) < g(n) < Bf(n) for all sufficiently large n. In this case 
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we say that f and g grow at the same rate. We say that h(n) is O(f(n)) if there exists a 
positive constant B such that h(n) < Bf(n) for all sufficiently large n. In this case we say 
that h grows no faster than f or, equivalently, that f grows at least as fast as h. 


The phrase “S(n) is true for all sufficiently large n” means that there is some integer N 
such that S(n) is true whenever n > N. Saying that something is O(f(n)) gives an idea 
of how big it is for large values of n. Saying that something is O(f(n)) gives an idea of an 
upper bound on how big it is for all large values of n. (We said “idea of” because we don’t 
know what the constants A and B are.) 


Theorem 7 (Some properties of 0 and O) We have 


(a) If g(n) is O(f(n)), then g(n) is O(f(n)). 

(b) f(n) is O(f(n)) and f(n) is O(f(n)). 

(c) If g(n) is O(f(n)) and C and D are positive constants, then Cg(n) is O(Df(n)). 
If g(n) is O(f(n)) and C and D are positive constants, then C'g(n) is O(Df(n)). 

(d) If g(n) is O(f(n)), then f(n) is O(g(n)). 

(e) If g(n) is O(f(n)) and f(n) is O(h(n)), then g(n) is O(h(n)). 
If g(n) is O(f(n)) and f(n) is O(h(n)), then g(n) is O(h(n)). 

(f) Ifgi(n) is O(f ae g2(n) is O(fa(n)), then gi(n)+g2(n) is O( 


( (fi( ) is 

If gi(n) is O(fi(n)), ga(n) is O( fo(n)), then gi(n)+g2(n) is O(max 
Note that as a consequence of properties (b), (d) and (e) above, the statement “g(n) is 
O(f(n))” defines an equivalence relation on the set of functions from the positive integers 
to the nonnegative reals. As with any equivalence relation, we can think of it globally as 
partition into equivalence classes or locally as a relation between pairs of elements in the set 
on which the equivalence relation is defined. In the former sense “g(n) is O(f(n))” means 
that “g(n) belongs to the equivalence class O( f(n)) associated with f.” In the latter sense, 
“g(n) is O(f(n))” means g ~e f where ~@ is an equivalence relation called “is 0.” 


Proof: Most of the proofs are left as an exercise. We’ll do (e) for ©. We are given that 
there are constants A; and B; such that 


Aif(n) < g(n) < Bif(n) 


and 


gh(n) < f(n) < Boh(n) 
for all sufficiently large n. It follows that 


Aj Agh(n) < Aif(n) < g(n) < Bif(n) < Bi Boh(n) 


for all sufficiently large n. With A = A,Ag and B = B,Bz, it follows that g(n) is 
O(h(n)). O 
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Example 23 (Additional observations on 0 and O) In this example, we have collected 
some additional information about our notation. 


Functions which are not always positive. Our definitions of 0 and O are only for 
functions whose values are nonnegative. The definitions can be extended to arbitrary 
functions by using absolute values; e.g., A|f(n)| < |g(n)| < Bl f(n)| means g(n) = O(f(n)). 
All the results in the theorem still hold except (f) for ©. This observation is most often 
applied to the case where the function f is “eventually” nonnegative (4M such that Vn > 
M,f(n) > 0). This is the case, for example with any polynomial in n with positive 
coefficient for the highest power of n. 


Taking limits. When comparing two well-behaved functions f(n) and g(n), limits can 
be helpful: 


Jim we =C>0 implies g(n) is O(f(n)) 
and 
Jim, oe =C>0 implies g(n) is O(f(n)). 


We assume here that the function f is never zero past some integer N so that the ratio is 
defined. The constants A and B of the definition can, in the first case, be taken to be C —« 
and C+ ¢, where € is any positive number (¢€ = 1 is a simple choice). In the second case, 
take B tobe +e. If, in the first case, C = 1, then f and g are said to be asymptotic 
or asymptotically equal. This is written f ~ g. If, in the second case, C' = 0, then g is 
said to be little oh of f (written g = o(f)). We will not use the “asymptotic” and “little 
oh” concepts. 


Polynomials. — In particular, you can take any polynomial as f(n), say f(n) = agn® + 
--» + ag, and any other polynomial as g(n), say g(n) = byn® +--- +9. For f and g to 
be eventually positive we must have both a, and by positive. If that is so, then g(n) is 
Q(f(n)). Note in particular that we must have g(n) is O(n*). 


Logarithms. Two questions that arise concerning logarithms are (a) “What base should 
I use?” and (b) “How fast do they grow?” 

The base does not matter because log, x = (log, b)(log, x) and constant factors like 
log, 6 are ignored in O( ) and O( ). 

It is known from calculus that logn > co as n > co and that lim,_,.,. (log n)/n* = 0 
for every € > 0. Thus logarithms grow, but they grow slower than powers of n. For example, 
nlogn is O(n/?) but n3/? is not O(nlogn). 


A proof. How do we prove 
lim =~=C>0 implies g(n) is O(f(n))? 


By definition, the limit statement means that for any € > 0 there exists N such that for all 
n>WN, 4 —C|<e. Ife > C, replace it with a smaller «. From cai —C| <e, for all 
n>N, 


c-6< 9 coe or (C—e)f(n) < =~ < (C+ e)f(n). 


Take A = (C — €) and B = (C' +e) in the definition of 0. OJ 
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Example 24 (Using ©) To illustrate these ideas, we’ll consider three algorithms for 
evaluating a polynomial p(x) of degree n at some point r; i.e., computing po +pirt+:--+prr”. 
We are interested in how fast they are when n is large. Here are the procedures. You should 
convince yourself that they work. 


Polyl(n, p,r) 


S = po 
For i =1,...,n S = S +p; * Pow(r,i). 
Return S$ 
End 
Pow(r, 7) 
P=1 
For j=1,...,.n P=Prr. 
Return P 
End 
Poly2(n, p,r) 
S = po 
P=1 
For i=1,..., 
P=Prr. 
End for 
Return S 
End 
Poly3(n, p,r) 
S= Pn 
Fori=n,...,2,1 S=Sx*r+p;_1 
Return S$ 
End 


Let T,,(Name) be the time required for the procedure Name. Let’s analyze Polyl. The 
“For” loop in Pow is executed 7 times and so takes Ci operations for some constant C’. 
The setup and return in Pow takes some constant number of operations D. Thus T,,(Pow) = 
Ci+D operations. As a result, the ith iteration of the “For” loop in Polyl takes Ci + E 
operations for some constants C and £ > D. Adding this over i = 1,2,...,n, we see that 
the total time spent in the “For” loop is O(n”) since }7)_, i = n(n + 1)/2. (This requires 
using some of the properties of ©. You should write out the details.) Since the rest of 
Poly1 takes O(1) time, T,,(Poly1) is O(n?). 


The amount of time spent in the “For” loop of Poly2 is constant and the loop is 
executed n times. It follows that T,,(Poly2) is O(n). The same analysis applies to Poly3. 


What can we conclude from this about the comparative speed of the algorithms? 
By the definition of ©, there are positive reals A and B so that An? < T),(Polyl) and 
T,(Poly2) < Bn for sufficiently large n. Thus T;,(Poly2)/T,,(Poly1) < B/An. As n gets 
larger, Poly2 looks better and better compared to Polyl. 


Unfortunately, the crudeness of © does not allow us to make any distinction between 
Poly2 and Poly3. What we can say is that T;,(Poly2) is O(T;,(Poly3)); i.e., T,,(Poly2) and 
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T,(Poly3) grow at the same rate. A more refined estimate can be obtained by counting the 
actual number of operations involved. J 


So far we have talked about how long an algorithm takes to run as if this were a simple, 
clear concept. In the next example we'll see that there’s an important point that we’ve 
ignored. 


*Example 25 (What is average running time?) Let’s consider the problem of 
(a) deciding whether or not a simple graph can be properly colored with four colors 
and, (b) if a proper coloring exists, producing one. A proper coloring of a simple graph 
G = (V, £) is a function A: V > C, the set of “colors,” such that, if {u,v} is an edge, then 
A(u) # A(v). We may as well assume that V = n and that the colors are cy, C2, c3 and cq. 


Here’s a simple algorithm to determine a A by using backtracking to go lexicographi- 
cally through possible colorings A(1), A(2),..., A(m). 


1. Initialize: Set v = 1 and A(1) = c1. 


2. Advance in decision tree: If v = n, stop with determined; otherwise, set 
v=v4+land A(v) =c1. 


3. Test: If A(z) 4 A(v) for all i < v for which {7,v} € E, go to Step 2. 


4. Select next decision: Let j be such that A(v) = c;. If j < 4, set A(v) = cj41 
and go to Step 3. 


5. Backtrack: If v = 1, stop with coloring impossible; otherwise, set v = v — 1 and 
go to Step 4. 


How fast is this algorithm? Obviously it will depend on the graph. Here are two 
extreme cases: 


e Suppose the subgraph induced by the first five vertices is the complete graph Ks (i.e., 
all of the ten possible edges are present). The algorithm stops after trying to color the 
first five vertices and discovering that there is no proper coloring. Thus the running 
time does not depend on n and so is in @(1). 


e Suppose that the first n —5 vertices have no edges and that the last five vertices induce 
Ks. The algorithm tries all possible assignments of colors to the first n — 5 vertices 
and, for each of them, discovers that it cannot properly color the last five because they 
form Ks. Thus the algorithm makes between 4"~° and 4” assignments of colors and 
so its running time is O(4") — a much faster growing time than O(1). 


What should we do about studying the running time of such an algorithm? It’s reason- 
able to talk about the average time the algorithm takes if we expect to give it lots of graphs 
to look at. Most n vertex graphs will have many sets of five vertices that induce Ks. (We 
won’t prove this.) As a result, the algorithm has running time in O(1) for most graphs. In 
fact, it can be proved that the average number of assignments of the form A(v) = cz that 
are made is O(1) and so the average running time is O(1). This means that the average 
running time of the algorithm is bounded for all n, which is quite good! 


Now suppose you give this algorithm to a friend, telling her that the average running 
time is bounded. She thanks you profusely for such a wonderful algorithm and puts it to 
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work coloring randomly generated “planar” graphs. These are a special class of graphs 
whose pictures can be drawn in the plane without edges crossing each other. (All trees 
are planar, but Ks is not planar.) By a famous theorem called the Four Color Theorem, 
every planar graph can be properly colored with four colors, so the algorithm will find the 
coloring. To do so it must make assignments of the form A(v) = cx, for each vertex v. Thus 
it must make at least n assignments. (Actually it will almost surely make many, many 
more.) Your friend soon comes back to you complaining that your algorithm takes a long 
time to run. What went wrong? 


You were averaging over all simple graphs with n vertices. Your friend was averaging 
over all simple planar graphs with n vertices. The average running times are very different! 
There is a lesson here: 


You must be VERY clear what you are averaging over. 


Because situations like this do occur in real life, computer scientists are careful to specify 
what kind of running time they are talking about; either the average of the running time 
over some reasonable, clearly specified set of problems or the worst (longest) running time 
over all possibilities. JJ 


You should be able to see that saying something is O( ) leaves a lot out because we 
have no idea of the constants that are omitted. How can we compare two algorithms? 
Here are two rules of thumb. 


e If one algorithm is O(f(m)) and the other is O(g(n)), the algorithm with the slower 
growing function (f or g) is probably the better choice. 


e If both algorithms are O(f(n)), the algorithm with the simpler data structures is 
probably better. 


These rules are far from foolproof, but they provide some guidance. 


*Polynomial Time Algorithms 


Computer scientists talk about “polynomial time algorithms.” What does this mean? 
Suppose that the algorithm can handle arbitrarily large problems and that it takes O(n) 
seconds on a problem of “size” n. Then we call it a linear time algorithm. More generally, 
if there is a (possibly quite large) integer k such that the worst case running time on a 
problem of “size” n is O(n*), then we say the algorithm is polynomial time. 


You may have noticed the quotes around size and wondered why. It is necessary to 
specify what we mean by the size of a problem. Size is often interpreted as the number of 
bits required to specify the problem in binary form. You may object that this is imprecise 
since a problem can be specified in many ways. This is true; however, the number of bits in 
one “reasonable” representation doesn’t differ too much from the number of bits in another. 
We won’t pursue this further. 


If the worst case time for an algorithm is polynomial, theoretical computer scientists 
think of this as a good algorithm. (This is because polynomials grow relatively slowly; for 
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example, exponential functions grow much faster.) The problem that the algorithm solves 
is called tractable. 


Do there exist intractable problems; i.e., problems for which no polynomial time algo- 
rithm can ever be found? Yes, but we won’t study them here. More interesting is the fact 
that there are a large number of practical problems for which 


e no polynomial time algorithm is known and 
e no one has been able prove that the problems are intractable. 
We'll discuss this a bit. 
Consider the following problems. 


e Coloring problem: For any c > 2, devise an algorithm whose input can be any 
simple graph and whose output answers the question “Can the graph be properly 
colored in ¢ colors?” 


e Traveling salesman problem: For any B, devise an algorithm whose input 
can be any n > 0 and any real valued edge labeling, A: P2(n) > R, for Ky, the 
complete graph on n vertices. The algorithm must answer the question “Is there 
a cycle through all n vertices with cost B or less?” (The cost of a cycle is the sum 
of \(e) over all e in the cycle.) 


e Clique problem: Given a simple graph G = (V, £) and an integer s, is there 
a subset S C V, |S| = s, whose induced subgraph is the complete graph on S (i.e., 
a subgraph of G with vertex set S and with (5) edges)? 


No one knows if these problems are tractable, but it is known that, if one is tractable, then 
they all are. There are hundreds more problems that people are interested in which belong 
to this particular list in which all or none are tractable. These problems are called NP- 
complete problems Many people regard deciding if the NP-complete problems are tractable 
to be the foremost open problem in theoretical computer science. 


The NP-complete problems have an interesting property which we now discuss. If the 
algorithm says “yes,” then there must be a specific example that shows why this is so (an 
assignment of colors, a cycle, an automaton). There is no requirement that the algorithm 
actually produce such an example. Suppose we somehow obtain a coloring, a cycle or an 
automaton which is claimed to be such an example. Part of the definition of NP-complete 
requires that we be able to check the claim in polynomial time. Thus we can check a 
purported example quickly but, so far as is known, it may take a long time to determine if 
such an example exists. In other words, I can check your guesses quickly but I don’t know 
how to tell you quickly if any examples exist. 


There are problems like the NP-complete problems where no one knows how to do any 
checking in polynomial time. For example, modify the traveling salesman problem to ask 
for the minimum cost cycle. No one knows how to verify in polynomial time that a given 
cycle is actually the minimum cost cycle. If the modified traveling salesman problem is 
tractable, so is the one we presented above: You need only find the minimum cost cycle 
and compare its cost to B. Such problems are called NP-hard because they are at least as 
hard as NP-complete problems. A problem which is tractable if the NP-complete problems 
are tractable is called NP-easy. 


Some problems are both NP-easy and NP-hard but may not be NP-complete. Why is 
this? NP-complete problems must ask a “yes/no” type of question and it must be possible 
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to check a specific example in polynomial time as noted in the previous paragraph. We 
discuss an example. 


*Example 26 (Chromatic number) The chromatic number x(G) of a graph G is the 
least number of colors needed to properly color G. The problem of deciding whether a graph 
can be properly colored with c colors is NP-complete. The problem of determining y(G) 
is NP-hard. If we know y(G), then we can determine if c colors are enough by checking if 
c>x(G). 


The problem of determining y(G) is also NP-easy. You can color G with c colors if 
and only if c > y(G). We know that 0 < x(G) < n for a graph with n vertices. Ask if c 
colors suffice for c = 0,1,2,.... The least c for which the answer is “yes” is x(G). Thus the 
worst case time for finding (G) is at most n times the worst case time for the NP-complete 
problem. Hence one time is O of a polynomial in n if and only if the other is. QO 


What can we do if we cannot find a good algorithm for a problem? There are three 
main types of partial algorithms: 


1. Almost good: It is polynomial time for all but a very small subset of possible 
problems. (If we are interested in all graphs, our coloring algorithm in Example 25 
is almost good for any fixed c.) 


2. Almost correct: It is polynomial time but in some rare cases does not find the 
correct answer. (If we are interested in all graphs and a fixed c, automatically 
reporting that a large graph can’t be colored with c colors is almost correct — 
but it is rather useless.) In some situations, a fast almost correct algorithm can 
be useful. 


3. Close: It is a polynomial time algorithm for a minimization problem and comes 
close to the true minimum. (There are useful close algorithms for approximating 
the minimum cycle in the Traveling Salesman Problem.) 


Some of the algorithms make use of random number generators in interesting ways. 
Unfortunately, further discussion of these problems is beyond the scope of this text. 


*A Theorem for Recursive Algorithms 


The following material may be somewhat more difficult than that in other starred sections. 


Some algorithms, such as merge sorting, call themselves. This is known as a recursive 
algorithm or a divide and conquer algorithm. 


When we try estimate the running time of such algorithms, we obtain a recursion. In 
Section 2 of Unit DT, we examined the problem of solving recursions. We saw that finding 
exact solutions to recursions is difficult. The recursions that we obtain for algorithms are 
not covered by the methods in that section. Furthermore, the recursions are often not 
known exactly because we may only be able to obtain an estimate of the form O( ) for 
some of the work. The next example illustrates this problem. 
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*Example 27 (Sorting by recursive merging) Given a list L of n items, we wish to 
sort it. Here is the merge sorting algorithm from Section 3 of Unit DT. 


Sort (L) 
If length is 1, return L 
Else 
Split L into two lists L1 and L2 
S1 = Sort(L1) 
S2 = Sort(L2) 
S = Merge(S1, S2) 
Return $ 
End if 
End 


We need to be more specific about how the lists are split. Let m be n/2 rounded down, let 
L1 be the first m items in L and let L2 be the last n — m items in L. 


One way to measure the running time of Sort(L) is to count the number of comparisons 
that are required. Let this number be T(n). We would like to know how fast T(n) grows 
as a function of n so we can tell how good the algorithm is. For example, is T(n) = O(n)? 
is T(n) = O(n”)? or does it behave differently? 


We now start work on this problem. Since the sorting algorithm is recursive (calls 
itself), we will end up with a recursion. This is a general principle for recursive algorithms. 
You should see why after the next two paragraphs. 


All comparisons are done in Merge(S1,S2). It can be shown that the number of com- 
parisons in Merge is between m and n — 1. We take that fact as given. 


Three lines of code are important: 


S1 = Sort(L1) a recursive call, so it gives us T(m); 
S2 = Sort(L2) a recursive call, so it gives us T(n — m); 
S = Merge(S1, S2) where the comparisons are, so it gives us ay 


with m<a,<n-1. 


We obtain T(n) = T(m)+T(n—m) +a, where all we know about a, is that it is between 
m and n — 1. What can we do? 


Not only is this a type of recursion we haven’t seen before, we don’t even know the 
recursion fully since all we have is upper and lower bounds for a,,. The next theorem solves 
this problem for us. 0 


The following theorem provides an approximate solution to an important class of ap- 
proximate recursions that arise in divide and conquer algorithms. We’ll apply it to merge 
sorting. In the theorem 


e T(n) is the running time for a problem of size n. 


e If the algorithm calls itself at w places in the code, then the problem is divided into 
w smaller problems of the same kind and s1(n),..., 5 »(n) are the sizes of the smaller 
problems. 
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e The constant c measures how much smaller each of these problems is. 


e The time needed for the rest of the code is an. 


*Theorem 8 (Master Theorem for Recursions*) Suppose that there are 
(i) numbers N, b, w > 1 and0<c< 1 that do not depend on n 
(ii) a sequence a1, 42,..., 
(iii) functions $1, $2,...,Sw, and T 
such that 
(a) T(n) > 0 for alln > N anda, > 0 for alln > N; 
(b) T(n) =a, + T(s1(n)) + T(se(n)) +--- +7 (Sw(n)) for alln > N; 
(c) an is O(n®) (Ifa, = 0 for all large n, set b = —ov.); 
(d) |s;(n) — cn] is O(1) fori = 1,2,...,w. 
Let d = — log(w)/log(c). Then 


O(n?) ifb<d, 
T(n) is ¢ O(ntlogn) ifb=d, 
@(n°) ifb>d. 


Note that b = 0 corresponds to a, being in @(1) since n? = 1. In other words, ay is 
bounded by nonzero constants for all large n: 0 < Cy < dyn < Co. 


Let’s apply the theorem to our recursion for merge sorting: 
T(n) = an + T(si(n)) + T(s2(n)) 
where 
si(n) =|n/2], se(n)=n—|n/2] and si(n)<a,<n-1. 
Note that s;(nm) and s2(n) differ from n/2 by at most 1/2 and that a, = O(n). Thus we 
can apply the theorem with w = 2, b=1 and c= 1/2. We have 


d = — log(2)/log(1/2) = log(2)/log(2) = 1. 
Since b = d = 1, we conclude that T(n) is O(nlogn). 


How do we use the theorem on divide and conquer algorithms? First, we must find 
a parameter n that measures the size of the problem; for example, the length of a list to 
be sorted, the degree of polynomials that we want to multiply, the number of vertices in a 
graph that we want to study. Then use the interpretation of the various parameters that 
was given just before the theorem. 


Our final example is more difficult because the algorithm that we study is more com- 
plicated. It was believed for some time that the quickest way to multiply polynomials was 
the “obvious” way that is taught when polynomials are first studied. That is not true. The 
next example contains an algorithm for faster multiplication of polynomials. There are also 
faster algorithms for multiplying matrices. 


* This is not the most general version of the theorem; however, this version is easier to 
understand and is usually sufficient. For a more general statement and a proof, see any 
thorough text on the analysis of algorithms. 
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*Example 28 (Recursive multiplication of polynomials) Suppose we want to mul- 
tiply two polynomials of degree at most n, say 


P(x) = pot pix +--+ + pypx” and Q(x) =qtunrt---+qn2x”. 


The natural way to do this is to use the distributive law to generate (n + 1)? products 


Dogo, Pogix, Pog2x7,--., PnGnx?” and then collect the terms that have the same powers of 


zx. This involves (n + 1)? multiplications of coefficients and, it can be shown, n? additions 
of coefficients. Thus, the amount of work is O(n”). Unless we expect P(x) or Q(x) to have 
some coefficients that are zero, this seems to be best we can do. Not so! We now present 
and analyze a faster recursive algorithm. 


The algorithm depends on the following identity which you should verify by checking 
the algebra. 


Identity: If Pr(x), Py(x), Qx(x) and Qx(zx) are polynomials, then 
(Pr@) + Pan”) (Q1(z) + Qu(x)2"™) = A(x) + (C(z) — A(x) - Bae” + B(x)x?™ 


where 
A(x) = Pr(x)Qi(z), B(x) = Pu(2)Qu(z), 


and 


C(x) = (Pr(2) + Pu(2)) (Qz(z) + Qu(2)) 


We can think of this identity as telling us how to multiply two polynomials P(x) and 
Q(x) by splitting them into lower degree terms (Pr(x) and Q,(«)) and higher degree terms 
(Pa(x)x™ and Qy(x)z™): 


P(x) = Pp(z)+ Pu(xz)z™ and Q(z) = Qr(x) + Qua(z)z”™. 


The identity requires three polynomial multiplications to compute A(x), B(x) and C(z). 

This leads naturally to two questions: 
e Haven’t things gotten worse — three polynomial multiplications instead of just one? 
No. The three multiplications involve polynomials of much lower degrees. We will see 


that this leads to a gain in speed. 


e How should we do these three polynomial multiplications? Apply the identity to each 
of them. In other words, design a recursive algorithm. We do that now. 


Here is the algorithm for multiplying two polynomials P(x) = pp + pix--++pyx” and 
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Q(x) =q+uxt+-::+qnx” of degree at most n. 


MULT(P(z), Q(z), n) 
If (n=0) Return pogo 
Else 
Let m = n/2 rounded up. 
Py (x) = po t+ pit +---Pm1e™ * 


Py (x) = Pm + Pm41f +++*pnx™—™ 
QL(x) = ; +qe+-:: ee 
Qu(£) = 4m + ee 1 
A(x) = MULT (Pr — 1) 
B(x) = so P00 o ee 
C(z) = MULT(P,(2) + Pal (x), Qx(x) + Qu(a), n—m) 
D(a) = A(x) + (C(2) xz) — A(x) — B@))a™ + B(x)x?™ 
Return D(z) 
End if 
End 


As is commonly done, we imagine a polynomial stored as a vector of coefficients. The 
amount of work required is then the number of times we have to multiply or add two 
coefficients. For simplicity, we just count multiplications. Let that number be T(n). You 
should be able to see that T(0) = 1 and 


T(n) =T(m—-1)+T(n-—m)+T(n-—m) for n>0. 


We can write this as 

T(n) = T(m—-1)+T(n-—m)+T(n—m)+an, ao =1 and a, = 0 for n> 0. 
Note that, since both m — 1 and n — m differ from n/2 by at most 1, w = 3 and c = 1/2. 
Also b = —oo 


We have d = log3/log2 > b. Thus T(n) is O(ni°83/'°82). Since log 3/log2 is about 
1.6 which is less than 2, this is less work than the straightforward method when n is large 
enough. (Recall that the work there was in @(n?).) O 


Exercises for Section 4 


4.1. We have three algorithms for solving a problem for graphs. Suppose algorithm A 
takes n? milliseconds to run on a graph with n vertices, algorithm B takes 100n 
milliseconds and algorithm C takes 100(2"/!° — 1) milliseconds. 


(a) Compute the running times for the three algorithms with n = 5, 10, 30, 100 
and 300. Which algorithm is fastest in each case? slowest? 
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(b) Which algorithm is fastest for all very large values of n? Which is slowest? 


4.2. Let p(x) be a polynomial of degree & with positive leading coefficient and suppose 
that a > 1. Prove the following. 


(a) O(p(n)) is O(n"). 

(b) O(p(n)) is O(n*). 

(c) lim p(n)/a” = 0. (Also, what does this say about the speed of a polynomial 
fine algorithm versus one which takes exponential time?) 


(d) Unless p(x) = p,x* + po for some p; and py, there is no C’ such that a? is 
@(aCn"). 


4.3. In each case, prove that g(n) is O(f(n)) using the definition of “g is O(f)”. (See 
Definition 21.) 


(a) g(n) =n? +5n? +10, f(n) = 20n°. 
(b) g(n) =n? + 5n? +10, f(n) = 200n? 


4.4. In each case, show that the given series has the indicated property. 
(a) 7, is O(n’). 
(b) S37, 8 is O(n4). 
(c) 3, 8? is O(n3/?). 


4.5. Show each of the following 
(a) 577,271 is O(log, (n)) for any base b > 1. 
(b) log,(n!) is O(n log,(n)) for any base b > 1. 
(c) nl is O((n/e)"t1/2). 


*4.6. The following algorithm multiplies two n xn matrices A and B and puts the answer 
in C. Let T(n) be the running time of the algorithm Find a simple function f(n) 
so that is O(f(n)). 


MATRIXMULT(n,A,B,C) 
For i=1,....n 
For j=1,...,n 
C(i,j)=0 
For k=1,...,n 
C(ij) = Cj) + AG,K)*B(k,j) 
End for 
End for 
End for 
End 
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*4.7. The following algorithm computes x” for n a positive integer, where x is a com- 
plicated object (e.g., a large matrix). MULT(z,y) is a procedure that multiplies 
two such objects. Let T'(n) be the number of times MULT is called. Find a simple 
function f(n) so that T(n) is O(f(n)). 


POW(z, n) 
If (n=1) Return x 
Else 
Let q be n/2 rounded down and r = n — 2q. 
y = MULT(z, z) 
z= POW(y, g) 
If (r=0) Return z 
Else 
w = MULT (a; 2) 
Return w 
End if 
End if 
End 
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Multiple Choice Questions for Review 


Some of the following questions assume that you have done the exercises. 


1. Indicate which, if any, of the following five graphs G = (V,E,¢), |V| = 5, is not 
isomorphic to any of the other four. 


(a) ¢= (a 3} (2.4) (1.2) (2.3) (3.5) a) 
6) 6= (2y abn kn on ory 5) 
() $= (44's) a’ 9 oy hn os) 
(d) d= Ge oa (2.3) (34) {4 {4,5} {4, ) 


b a e d Cc 
(e) o= (as 1,3} {1,3} {2.3} {2.5} aa) 


2. Indicate which, if any, of the following five graphs G = (V,F,¢), |V| = 5, is not 
connected. 


(a) ¢= Ger nee oe 3,4} (1,5) in) 
(b) g= Ga as} {1.3} (23) (25) cls) 
(c) d= as sy {1.3} (2.3) 2.8} u's) 
(d) d= ia a) 1.33 2.3} {3,4} nee 


_ a b c d e if 
(c) o= ee {2,3} {1,2} {1,3} {2,3} a) 


3. Indicate which, if any, of the following five graphs G = (V,E,¢), |V| = 5, have an 
Eulerian circuit. 


(a) d= (a {12} {2.3} {3.4} {45} coe 
(b) ¢= ee ay {1.3} {2,3} (2,4) me 
(c) d= (ay 11.9} (2,3) {3,4} (4.5} a 
(d) ¢= ee 1.3} {1.3} {2,3} 2.5} ua) 


a b c d e 
(e) o= (ity {3,4} {1,2} {2.3} {3.5} aa) 


4. A graph with V = {1,2,3,4} is described by ¢ = 
How many Hamiltonian cycles does it have? 


a b c d e f 
a {1,2} {1,4} {2,3} {3,4} as) . 


Review Questions 


. A graph with V = {1,2,3,4} is described by ¢ = Cas aa a4} ne 34} os) ; 
a 0b de f 
2 As 2 


It has weights on its edges given by A = G 9 : . How many minimum 


spanning trees does it have? 


. Define an RP-tree by the parent-child adjacency lists as follows: 
(i) Root B: J, H, K; (ii) H: P,Q, R; (iii) Q:$, T; (iv) K: L, M, N. 


The postorder vertex sequence of this tree is 

(a) a, P,-S; 7, Q,; R,-H, L, M,N, K, B: 

(by P85, T, JQ. Ry Hy by NE NK, B: 

(c) P, 8, T, Q, R, H, L, M, N, K, J, B. 

(d)> PSP .-@.- Rody Hd, NLAN, KB: 

(e) S, T, Q, J, P, R, H, L, M, N, K, B. 

. Define an RP-tree by the parent-child adjacency lists as follows: 


(i) Root B: J, H, K; (ii) J: P,Q, R; (iii) Q: S, T; (iv) K: L, M, N. 


The preorder vertex sequence of this tree is 
(a) B, J, H, K, P, Q, R, L, M,N, S, T. 
(b) B, J, P, Q, 8, T, R, H, K, L, M,N. 
(c) B, J, P, Q, S, T, R, H, L, M, N, K. 
(d) B, J, Q, P, S, T, R, H, L, M,N, K. 
(e) B, J, Q, S, T, P, R, H, K, L, M, N. 
. For which of the following does there exist a graph G = (V, E, ¢) satisfying the specified 


conditions? 


) A tree with 9 vertices and the sum of the degrees of all the vertices is 18. 


c) A graph with 5 components 30 vertices and 24 edges. 


(a 
(b) A graph with 5 components 12 vertices and 7 edges. 
d) A graph with 9 vertices, 9 edges, and no cycles. 

e 


( 
( 
(e) A connected graph with 12 edges 5 vertices and fewer than 8 cycles. 


. For which of the following does there exist a simple graph G = (V, FE) satisfying the 
specified conditions? 


(a) It has 3 components 20 vertices and 16 edges. 


(b) It has 6 vertices, 11 edges, and more than one component. 
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10. 


11. 


12. 


13. 


14. 


15. 


16. 


(c) It is connected and has 10 edges 5 vertices and fewer than 6 cycles. 

(d) It has 7 vertices, 10 edges, and more than two components. 

(e) It has 8 vertices, 8 edges, and no cycles. 

For which of the following does there exist a tree satisfying the specified constraints? 
(a) A binary tree with 65 leaves and height 6. 

(b) A binary tree with 33 leaves and height 5. 


c) A full binary tree with height 5 and 64 total vertices. 


) 

(c) 

(d) A full binary tree with 23 leaves and height 23. 
) 


(e) A rooted tree of height 3, every vertex has at most 3 children. There are 40 total 
vertices. 


For which of the following does there exist a tree satisfying the specified constraints? 
(a) A full binary tree with 31 leaves, each leaf of height 5. 


(b) A rooted tree of height 3 where every vertex has at most 3 children and there are 
41 total vertices. 


(c) A full binary tree with 11 vertices and height 6. 

(d) A binary tree with 2 leaves and height 100. 

(e) A full binary tree with 20 vertices. 

The number of simple digraphs with |V| = 3 is 

(a) 2° (6) 2? (c) 27 (cd): 2° (e) 2° 

The number of simple digraphs with |V| = 3 and exactly 3 edges is 
(a) 92 (b) 88 (c) 80 (d) 84 (e) 76 

The number of oriented simple graphs with |V| = 3 is 

(a) 27 (b) 24 (6) 21 (d) 18 (e) 15 

The number of oriented simple graphs with |V| = 4 and 2 edges is 
(a) 40 (b) 50 (c) 60 (d) 70 (e) 80 


In each case the depth-first sequence of an ordered rooted spanning tree for a graph 
G is given. Also given are the non-tree edges of G. Which of these spanning trees is a 
depth-first spanning tree? 


(a) 123242151 and {3,4}, {1,4} 
(b) 123242151 and {4,5}, {1,3} 
(c) 123245421 and {2,5}, {1,4} 
(d) 

(c) 123245421 and {3,5}, {1,4} 


123245421 and {3,4}, {1,4} 
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Review Questions 
12 53a eas 
(a) O((In(n))"7) — (b) On(n))_— (ce) O(N"”?)_— (d) O(n*/?)_—(e) O(n?) 


18. Compute the total number of bicomponents in all of the following three simple graphs, 
G = (V,£) with |V| = 5. For each graph the edge sets are as follows: 


Bath (2.8),.43 45 140) tisk oh {3,53 
B=4{{i,2)5 42,3), 484) (45h 4,3) + 
Beato 42 Sh t4/bh. V1 ott 
(a) 4 (b) 5 (c) 6 (d)7 (e) 8 
19. Let b> 1. Then log,((n?)!) is 


(a) O(log, (n!)) 

(b) O(log, (2 n!)) 
(c) O(nlog,(n)) 
(d) O(n? log, (n)) 
(e) O(n log, (n*)) 

20. What is the total number of additions and multiplications in the following code? 
s:=0 
fori:=1ton 

si=st+i 
for j:= 1 toi 
s:=s+j*i 
next j 
next i 
s:=s+10 


(a) n (b) n? (c) n?+2n (d) n(n +1) (e) (n+1)? 


Answers: 1 (d), 2 (e), 3(c), 4 (a), 5 (a), 6 (e), 7 (e), 8(c), 9 (a), 10 (d), 11 (b), 
12 (c), 13 (e), 14 (b), 15 (a), 16 (b), 17 (b), 18 (a), 19 (b), 20 (b), 21 (b), 22 (d), 
23 (c), 24 (d), 25 (d) 1(c), 2 (d), 3 (a), 4 (b), 5 (a), 6 (c), 7 (c), 8 (c), 9 (a), 
10 (c), 11 (a), 12 (c), 13 (b), 14 (d), 15 (a), 16 (c), 17 (b), 18 (d), 19 (a), 20 (d), 
21 (c), 22 (e), 28 (c) 1 (d), 2 (e), 3 (d), 4 (c), 5 (b), 6 (c), 7 (d), 8 (a), 9 (Cc), 
10 (c), 11 (d), 12 (a), 13 (b), 14 (a), 15 (c), 16 (b), 17 (c), 18 (b), 19 (e), 20 (a), 
21 (a), 22 (b), 23 (b) 1 (a), 2 (e), 3 (ce), 4 (c), 5 (b), 6 (a), 7 (b), 8 (b), 9 (d), 
10 (e), 11 (d), 12 (a), 13 (d), 14 (a), 15 (c), 16 (c), 17 (c), 18 (c), 19 (d), 20 (e). 
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This is a simple application of the Rules of Sum and Product. 


(a) Choose a discrete math text OR a data structures text, etc. This gives 5 + 2 + 
6+3=16. 


(b) Choose a discrete math text AND a data structures text, etc. This gives 5 x 2 x 
6 x 3 = 180. 


We can form n digit numbers by choosing the leftmost digit AND choosing the next 
digit AND --- AND choosing the rightmost digit. The first choice can be made in 9 
ways since a leading zero is not allowed. The remaining n — 1 choices can each be 
made in 10 ways. By the Rule of Product we have 9 x 10°~!. To count numbers with 
at most n digits, we could sum up 9 x 10*~! for 1 < k <n. The sum can be evaluated 
since it is a geometric series. This does not include the number 0. Whether we add 1 
to include it depends on our interpretation of the problem’s requirement that there be 
no leading zeroes. There is an easier way. We can pad out a number with less than n 
digits by adding leading zeroes. The original number can be recovered from any such 
n digit number by stripping off the leading zeroes. Thus we see by the Rule of Product 
that there are 10” numbers with at most n digits. If we wish to rule out 0 (which pads 
out to a string of n zeroes), we must subtract 1. 


For each element of S you must make one of two choices: “zx is/isn’t in the subset.” 
To visualize the process, list the elements of the set in any order: aj, a@2,...,a)s). We 
can construct a subset by 


including a; or not AND 
including a2 or not AND 


including aj) or not. 
(a) By the Rule of Product, we have 9 x 10 x --- x 10=9 x 10""1. 
(b) By the Rule of Product, we have 9”. 
(c) By the Rule of Sum, (answer)+9” = 9x 10"~1 and so the answer is 9(10”~'—9"~!) 


(a) This is like the previous exercise. There are 26* 4-letter strings and there are 
(26 — 5)4 4-letter strings that contain no vowels. Thus we have 264 — 214. 


(b) We can do this in two ways: 

First way: Break the problem into 4 problems, depending on where the vowel is 
located. (This uses the Rule of Sum.) For each subproblem, choose each letter in the 
list and use the Rule of Product. We obtain one factor equal to 5 and three factors 
equal to 21. Thus we obtain 5 x 21° for each subproblem and 4 x 5 x 21° for the final 
answer. 

Second way: Choose one of the 4 positions for the vowel, choose the vowel and choose 
each of the 3 consonants. By the Rule of Product we have 4 x 5 x 21 x 21 x 21. 


The only possible vowel and consonant pattern satisfying the two nonadjacent vowels 
and initial and terminal consonant conditions is CVCVC. By the Rule of Product, 
there are 3 x 2 x 3 x 2 x 3 = 108 possibilities. 
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To form a composition of n, we can write n ones in a row and insert either “6” or “,” 
in the spaces between them. This is a series of 2 choices at each of n — 1 spaces, so we 


obtain 2”~' compositions of n. The compositions of 4 are 


The allowable letters in alphabetic order are A, I, L, S, and T. There are 216 words 
that begin with L, and the same number that begin with S$, and with T. The word 
we are asked to find is the last one that begins with L. Thus the word is of the form 
LVCVCC, LVCCVC, or LCVCVC. Since all of the consonants in our allowable- 
letters list come after the vowels, we want a word of the form LCVCVC. We need 
to start off LTVCVC. The next letter, a vowel, needs to be I (bigger than A in the 
alphabet). Thus we have LTICVC. Continuing in this way we get LTITIT. The 
next name in dictionary order starts off with S and is of the form SVCVCC. We 
now choose the vowels and consonants as small as possible: SALALL. But, this word 
doesn’t satisfy the condition that adjacent consonants must be different. Thus the 
next legal word is SALALS. 


The ordering on the C; is as follows: 
C, = ((2, 4), (2,5), (8,5)) Cy = (AA, AL, IA, TT) 
G3 = (LL U8, LT, SL;88,ST;TL,TS;TT) °C, = (8,27, SL; ST, TL.TS). 
The first seven are 
(2,4)(AA)(LL)(LS), (2,4)(AA)(LL)(LT), (2,4)(AA)(LL)(SL), 
(2,4)(AA)(LL) (ST), (2,4)(AA)(LL)(TL), 
(2,4)(AA)(LL)(TS), (2,4)(AA)(LS)(LS). 
The last 7 are 
(3,5)(II)(TS) (TS), (3,5) (II)(TT)(LS), (3,5) 1) (TT) (LT), 
(3,5) (II) (TT)(SL), (3,5) ()(TT) (ST), 
(3,5) (II)(TT)(TL), (3,5) (11) (TT)(TS). 


The actual names can be constructed by following the rules of construction from these 
strings of symbols (e.g, (3,5)(I1)(TT)(LS) says place the vowels II in positions 3,5, the 
nonadjacent consonants are TT and the adjacent consonants are LS to get LSITIT). 


(a) One way to do this is to list all the possible multisets in some order. If you do 
this carefully, you will find that there are 15 of them. Unfortunately, it is easy to miss 
something if you do not choose the order carefully. One way to do this is to first write 
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all the a’s in the multiset, then all the b’s and then all the c’s. For example, we would 
write the multiset {a,b,c,a} as aabc. We can now list these in lex order: 


aaaa, aaab, aaac, aabb, aabc, aacc, abbb, abbc, 


abcc, accc, bbbb, bbbc, bbcc, becc, cccc 


For (b), the answer is that there are an infinite number because an element can be 
repeated any number of times. In fact, an infinite number of multisets can be formed 
by using just a. 


(a) We can arrange n people in n! ways. Use n = 7. 


(b) Arrange b boys (b! ways) AND arrange g girls (g! ways) AND choose which list 
comes first (2 ways). Thus we have 2(b! g!). Here b = 3 and g = 4 and the answer 
is 288. 


(c) As in (b), we arrange the girls and the boys separately, AND then we interleave 
the two lists as GBGBGBG. Thus we get 4! 3! = 144. 


This refers to the previous solution. 
(a) Usen=6. 
(b) b= g =3 and the answer is 72. 


(c) We can interleave in two ways, as BGBGBG or as GBGBGB and so we get 
2(3! 3!) = 72. 


For (a) we have the circular list discussed in the text and the answer is therefore 
ai/n=(n—I). 

For (b), note that each circular list gives two ordinary lists — one starting with the 
girls and the other with the boys. Hence the answer is 2(b! g!)/2 = b! g!. For the two 
problems we have 4! 3! = 144 and 3! 3! = 36. 

For (c), it is impossible if b < g since this forces two girls to sit together. If we have 
b = g, circular lists are possible. As in the unrestricted case, each circular list gives 
n= b+ g = 2g linear lists by cutting it arbitrarily. Thus we get 2(g!)?/2g = g! (g—1)!, 
which in this case is 3! 2! = 12. 

Each of the 7 letters ABMNRST appears once and each of the letters CIO appears 


twice. Thus we must form a list of length & from the 10 distinct letters. The solutions 
are 


k=2: 10x9 = 90 
k=3: 10x9x8 720 
k=4:10x9x8x7 =5040 


Each of the 7 letters ABMNRST appears once and each of the letters CIO appears 
twice. 


e For k = 2, the letters are distinct OR equal. There are (10)2 = 90 distinct choices. 
Since the only repeated letters are CIO, there are 3 ways to get equal letters. This 
gives 93. 


e For k = 3, we have either all distinct ((10)3 = 720) OR two equal. The two equal 
can be worked out as follows 
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choose the repeated letter (3 ways) AND 
choose the positions for the two copies of the letter (3 ways) AND 
choose the remaining letter (10 — 1 = 9 ways). 
By the Rules of Sum and Product, we have 720+ 3 x 9 x 3 = 801. 
(a) The letters are EILST. The number or 3-words is (5)3 = 60. 
(b) The answer is 5°? = 125. 


(c) The letters are EILST, with T occurring 3-times, L occurring 2-times. Either the 
letters are distinct OR one letter appears twice OR one letter appears three times. 
We have seen that the first can be done in 60 ways. To do the second, choose one 
of L and T to repeat, choose one of the remaining 4 different letters and choose 
where that letter is to go, giving 2 x 4 x 3 = 24. To do the third, use T. Thus, 
the answer is 60 + 24+ 1 = 85. 


(a) Stripping off the initial R and terminal F, we are left with a list of at most 4 letters, 
at least one of which is an L. There is just 1 such list of length 1. There are 3?—2? = 5 
lists of length 2, namely all those made from E, I and L minus those made from just 
E and I. Similarly, there are 3° — 2? = 19 of length 3 and 3* — 2+ = 65. This gives us 
a total of 90. 


(b) The letters used are E, F, I, L and R in alphabetical order. To get the word before 
RELIEF, note that we cannot change just the F and/or the E to produce an earlier 
word. Thus we must change the I to get the preceding word. The first candidate in 
alphabetical order is F, giving us RELF. Working backwards in this manner, we come 
to RELELF, RELEIF, RELEF and, finally, RELEEF. 


(a) If there are 4 letters besides R and F, then there is only one R and one F, for a 
total of 65 spellings by the previous problem. If there are 3 letters besides R and F, we 
may have R.--F, R---FF or RR---F, which gives us 3 x 19 = 57 words by the previous 
problem. We'll say there are 3 RF patterns, namely RF, RFF and RRF. If there are 2 
letters besides R and F, there are 6 RF patterns, namely the three just listed, RFFF, 
RRFF and RRRF. This gives us 6 x 5 = 30 words. Finally, the last case has the 6 RF 
patterns just listed as well as RFFFF, RRFFF, RRRFF and RRRRF for a total of 10 
patterns. This give us 10 words since the one remaining letter must be L. Adding up 
all these cases gives us 65 +57 + 30+ 10 = 162 possible spellings. Incidentally, there is 
a simple formula for the number of n long RF patterns, namely n — 1. Thus there are 


14+2+4+...4+(n—1)=n(n-1)/2 


of length at most n. This gives our previous counts of 1, 3, 6 and 10. 


(b) Reading toward the front of the dictionary from RELIEF we have RELIEF, 
RELFFF, RELFF, RELF, RELELF, RELEIF, RELEFF,..., and so the spelling five 
before RELIEF is RELEIF. 


There are n!/(n — k)! lists of length k. The total number of lists (not counting the 
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empty list) is 


n! n! ni onl 
a Gao aon 
ie Mast = 
=n _ _ 
Oo 1! (n — 1)! 
n-1 4; 
1? 
— yn! a 
nT 
i=0 


1 


Since e =e! = )°*, 1'/i!, it follows that the above sum is close to e. 


Choose values for pairs 
AND choose suits for the lowest value pair 
AND choose suits for the middle value pair 
AND choose suits for the highest value pair. 


This gives (13) (4)° = 61,776. 


Choose the lowest value in the straight (A to 10) AND choose a suit for each of the 
5 values in the straight. This gives 10 x 4° = 10240. 


Although the previous answer is acceptable, a poker player may object since a 
“straight flush” is better than a straight — and we included straight flushes in our 
count. Since a straight flush is a straight all in the same suit, we only have 4 choices of 
suits for the cards instead of 4°. Thus, there are 10 x 4 = 40 straight flushes. Hence, 
the number of straights which are not straight flushes is 10240 — 40 = 10200. 


If there are n 1’s in the sequence, there are n — 1 spaces between the 1’s. Thus, there 
are 2”—! compositions of n. A composition of n with k parts has k — 1 commas The 


number of ways to insert & — 1 commas into n — 1 positions is (ieee 
Note that EXERCISES contains 3 E’s, 2 S’s and 1 each of C, I, R and X. We can use 


the multinomial coefficient 


mM 1,Mg,...,Mz my! mg!+++myz! 


where n =m, +mo+...+mg,. Take n = 9, my = 3, mg = 2 and m3 = m4 = M5 = 
me = 1. This gives 9!/3! 2! = 30240. This calculation can also be done without the use 
of a multinomial coefficient as follows. Choose 3 of the 9 possible positions to use for 
the three E’s AND choose 2 of the 6 remaining positions to use for the two S’s AND 
put a permutation of the remaining 4 letters in the remaining 4 places. This gives us 


(3) x () x 4. 


An arrangement is a list formed from 13 things each used 4 times. Thus we have 
nm = 52 and m; = 4 for 1 <7 < 13 in the multinomial coefficient 
n _ n! 
M1, 772,---5,Mk m4! me!--- mg! 


(a) The first 4 names in dictionary order are LALALAL, LALALAS, LALALAT, 
LALALIL. 


(b) The last 4 names in dictionary order are TSITSAT, TSITSIL, TSITSIS, TSITSIT. 
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(c) To compute the names, we first find the possible consonant vowel patterns. They 
are CCVCCVC, CCVCVCC, CVCCVCC and CVCVCVC. The first three each 
contain two pairs of adjacent consonants, one isolated consonant and two vowels. 
Thus each corresponds to (3 x 2)? x 3 x 2? names. The last has four isolated 
consonants and three vowels and so corresponds to 34 x 2° names. In total, there 
are 1,944 names. 


The first identity can be proved by writing the binomial coefficients in terms of facto- 
rials. It can also be proved from the definition of the binomial coefficient: Choosing a 
set of size k from a set of size n is equivalent to choosing a set of size n — k to throw 
away, namely the things not chosen. 


The total number of subsets of an n element set is 2”. On the other hand, we can 
divide the subsets into collections T;, where T; contains all the 7 element subsets. The 
number of subsets in TJ; is (7). Apply the Rule of Sum. 


v 


S(n,n) =1: The only way to partition an n element set into n blocks is to put each 
element in a block by itself, so S(n,n) = 1. 


S(n,n—1) = (3) The only way to partition an n element set into n — 1 blocks is to 
choose two elements to be in a block together and put the remaining n — 2 elements 
in n — 2 blocks by themselves. Thus it suffices to choose the 2 elements that appear in 


a block together and so $(n,n — 1) = (5). 


S(n,1) = 1: The only way to partition a set into one block is to put the entire set 
into the block. 


S(n,2) = (2"—2)/2: We give two solutions. Note that S(n,k) is the number of k-sets 
S where the entries in S are nonempty subsets of a given n-set TJ and each element 
of T appears in exactly one entry of S. We will count k-lists, which is k! times the 
number of k-sets. We choose a subset for the first block (first list entry) and use the 
remaining set elements for the second block. Since an n-set has 2”, this would seem 
to give 2” /2; however, we must avoid empty blocks. In the ordered case, there are two 
ways this could happen since either the first or second list entry could be the empty 
set. Thus, we must have 2” — 2 instead of 2”. The answer is (2” — 2)/2. 

Here is another way to compute S(n,2). Look at the block containing n. Once 
it is determined, the entire two block partition is determined. The block containing n 
can be gotten by starting with n and adjoining one of the 2”~! — 1 proper subsets of 
fate emis een 


We use the hint. Choose 7 elements of {1,2,---,n} to be in the block with n+ 1 AND 
either do nothing else if i = n OR partition the remaining elements. This gives (") if 


i7=n and (") By,_-; otherwise. If we set Bp = 1, the second formula applies for 7 = n, 
too. Since i = 0 ORi=1OR--: ORi =n, the result follows. 


(b) To calculate B, for n < 5: We have Bp = 1 from (a). Using the formula in (a) for 
n =0,1,2,3,4 in order, we obtain Bj = 1, Bp = 2, By = 5, By = 15 and Bs = 52. 


(a) There is exactly one arrangement — 1,2,3,4,5,6,7,8,9. 


(b) We do this by counting those arrangements that have a; < aj, except, perhaps, 


for 1 = 5. Then we subtract off those that also have as < ag. In set terms: 


e S is the set of rearrangements for which a, < ag < a3 < a4 < as and ag < a7 < 
ag < do, 
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e T is the set of rearrangements for which a, < ag < a3 < a4 < a5 < ag < a7 < 
dg < dg, and 


e we want |S \ T| =|S| —|T]. 


An arrangement in S is completely determined by specifying the set {a1,...,a5}, 
of which there are (2) = 126. In (a), we saw that |J| = 1. Thus the answer is 
126 —1= 125. 


Let the probability space consist of all (e) = 15 pairs of horses and use the uniform 
probability. Thus each pair has probability 1/15. Since each horse is in exactly 5 pairs, 
the probability of your choosing the winner is 5/15 = 1/3, regardless of which horse 
wins. 


Here is another way. You could choose your first horse and your second horse, so 
the space consists of 6 x 5 choices. The probability that your first choice was the winner 
is 1/6. The probability that your second choice was the winner is also 1/6. Since these 
events are disjoint, the probability of picking the winner is 1/6 + 1/6 = 1/3. 


Usually the probability of winning a bet on a horse race depends on picking the 
fastest horse after much study. The answer to this problem, 1/3, doesn’t seem to have 
anything to do with studying the horses? Why? 


The sample space is {0,1,...,36,00}. We have P(0) = P(1) = --- = P(36) and 
P(00) = 1.05P(0). Thus 


1 = P(0) + --- + P(36) + P(00) = 38.05P(0). 


Hence P(0) = 1/38.05 and so P(00) = 1.05/38.05 = 0.0276. 


Let the event space be {A, B}, depending on who finds the key. Since Alice searches 
20% faster than Bob, it is reasonable to assume that P(A) = 1.2 P(B). The odds that 
Alice finds the key are P(A)/P(B) = 1.2, that is, 1.2:1, which can also be written 
as 6:5. Combining P(A) = 1.2 P(B) with P(A) + P(B) = 1, we find that P(A) = 
1.2/2.2 = 0.545. 


Let A be the event that you pick the winner and B the probability that you pick the 
horse that places. From a previous exercise, P(A) = 1/3 Similarly, P(B) = 1/3. We 
want P(AU B). By the principle of inclusion and exclusion, this is P(A) + P(B) — 
P(AN B). Of all CG) = 15 choices, only one is in AN B. Thus P(AN B) = 1/15 and 
the answer is 1/3 + 1/3 — 1/15 = 3/5. 


Since probabilities are uniform, we simply count the number of events that satisfy 
the conditions and divide by the total number of events, which is m” for n balls and 
m boxes. First we will do the problems in an ad hoc manner, then we’ll discuss a 
systematic solution. We use (a’)—(c’) to denote the answers for (d). 


(a) We place one ball in the first box AND one in the second AND so on. Since this 
can be done in 4! ways, the answer is 4!/4* = 3/32. 


(a’) We must have one box with two balls and one ball in each of the other three boxes. 
We choose one box to contain two balls AND two balls for the box AND distribute 
the three remaining balls into three boxes as in (a). This gives us 4x (3) x3! = 240. 
Thus the answer is 240/4° = 15/64. 
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(b) This is somewhat like (a’). Choose a box to be empty AND choose a box to 
contain two balls AND choose two balls for the box AND distribute the other two 
balls into the other two boxes. This gives 4 x 3 x (5) x 2! = 144. Thus the answer 
is 144/44 = 9/16. 


(b’) This is more complicated since the ball counts can be either 3,1,1,0 or 2,2,1,0. As 
in (b), there are 4 x 3 x (3) x 2! = 240 to do the first. In the second, there are 
(5) x 2 = 12 ways to designate the boxes and (3) x (3) = 30 ways to choose the 
balls for the boxes that contain two each. Thus there are 360 ways and the answer 


is (240 + 360) /4° = 75/128. 
(c 


SS 


Simply subtract the answer for (a) from 1 since we are asking for the complemen- 
tary event. This gives 29/32. For (c’) we have 39/64. 


We now consider a systematic approach. Suppose we want to assign n balls to m 
boxes so that exactly k < m of the boxes contain balls. Call the balls 1,2,...,n 
First partition the set of n balls into k blocks. This can be done in S(n,k) ways, 
where S(n,k) is the Stirling number discussed in Section 3. List the blocks in some 
order (pick your favorite; e.g., numerical order based on the smallest element in the 
block). Assign the first block to a box AND assign the second block to a box AND, 
etc. This can be done in m(m— 1)---(m—k+1) = m!/(m-—k)! ways. Hence the 
number of ways to distribute the balls is S(n,k)m!/(m— k)! and so the probability is 
S(n,k)m!/(m—k)!m”. For our particular problems, the answers are 


(a) $(4,4)4!/0! 44 = 3/32 (a’) S(5,4)4!/0!4° = 15/64 
(b) S$(4,3)4!/1!44 = 9/16 (b’) S(5,3)4!/1!4° = 75/128. 


The moral here is that if you can think of a systematic approach to a class of problems, 
it is likely to be easier than solving each problem separately. 


(a) Since the die is thrown k times, the sample space is S*, where S = {1,2,3,4,5, 6}. 
Since the die is fair, all 6” sequences in S* are equally likely. We claim that exactly 
half have an even sum and so P(£) = 1/2. Why do half have an even sum? Here are 
two proofs. 


e Let N,(n) be the number of odd sums in the first n throws and let N.(n) be the 
number of even sums. We have 


N.(k) =3Ne(k-—1)+3No(k—-1) and —_N,(k) = 3N,(k — 1) +3Ne(k—1) 


because an even sum is obtained from an even by throwing 2, 4, or 6 and from an 
odd by throwing 1, 3, or 5; and similarly for an odd sum. Thus N.(k) = No(k). 
Since the probability on S$” is uniform, the probability of an even sum is 1/2. 


e Let S, be all the k-lists in S” with odd sum and let S. be those with even sum. 
Define the function f : S* + S* as follows 


_ f (a1 +1,22,...,2%), if x1 is odd; 
fantom y ey eee if x1 is even. 


We leave it to you to convince yourself that this function is a bijection between 
S, and S,. (A bijection is a one-to-one correspondence between elements of S, 
and S,.) 
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(b) The sample space for drawing cards n times is S$” where S is the Cartesian product 
{A, 2, 3, eee | 10, J, Q, k} x (he, ©, 0, a}. 


The probability of any point in S$” is (1/52)". The number of draws with no king is 
(52—4)” and so the probability of none is (48/52)” = (12/13)”. The probability of at 
least one king is 1 — (12/13)”. 


(c) The equiprobable sample space is gotten by distinguishing the marbles M = 
{w 1, w2,w3,71,...} and defining the sample space by 


S ={(m,m'):m and m’ are distinct elements of M}. 


If E, is the event that both m and m’ are red, then P(E;,) = 4*3/|S| where |S| = 12*11. 


RELATED PROBLEMS TO THINK ABOUT: What is the probability of two white 
and two blue marbles being drawn if four marbles are drawn without replacement? Of 
two white and two blue marbles being drawn if four marbles are drawn with replace- 
ment? 


CL-4.7 This is nearly identical to the example on hypergeometric probabilities. The answer 
is C(5,3)C(10, 3)/C(15, 6). 


CL-4.8 Let B = {1,2,...,10}. 


(a) The sample space $ is the set of all subsets of B of size 2. Thus |S| = (‘)) = 45. 
Since each draw is equally likely, we just need to know how many pairs have an 
odd sum. One of the balls must have an odd label and the other an even label. 
The number of pairs with this property is 5 x 5 since there are 5 odd labels and 
5 even labels. Thus the probability is 25/45 = 5/9. 


(b) The sample space S is the set of ordered pairs (b;, 62) with b; 4 bz both from B. 
Thus |S| = 10 x 9 = 90. To get an odd sum, one of 6; and bz must be even and 
the other odd. Thus there are 10 choices for b; AND then 5 choices for bg. The 
probability is 50/90 = 5/9. 


(c) The sample space is S = B x B and || = 100. The number of pairs (bj, b2) is 50 
as in (b). Thus the probability is 50/100 = 1/2. 


CL-4.9 This is an inclusion and exclusion type of problem. There are three ways to approach 
such problems: 


e Have a variety of formulas handy that you can plug into. This, by itself, is not 
a good idea because you may encounter a problem that doesn’t fit any of the 
formulas you know. 


e Draw a Venn diagram and use the information you have to compute the probability 
of as many regions as you can. If there are more than 3 sets, the Venn diagram is 
too confusing to be very useful. With 2 or 3 sets, it is a good approach. 


e Carry out the preceding idea without the picture. We do this here. 


Suppose we are dealing with k sets, A;,...,A,. We need to know what the regions in 
the Venn diagram are. Each region corresponds to T, M---M TZ; where T; is either A; 
or AS. In our case, k = 2 and so the probabilities of the regions are 


P(ANB) P(AN B*) P(ACO B) P(ACN BY). 
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CL-4.10 


CL-4.11 


We get A by combining AN B and AM B°. We get B by combining AN B and ACN B. 
By properties of sets, (AU B)° = A©°M B*. Thus our data corresponds to the three 
equations 


P(ANB)+P(ANB*) =3/8 P(ANB)+P(A°NB)=1/2 P(A°N B*) = 3/8. 


We have one other equation: The probabilities of all four regions sum to 1. This gives 
us four equations in four unknowns whose solution is 


P(ANB)=1/4  P(ANB)=1/8 P(A°NB)=1/4  P(A°N B°) =3/8. 


Thus the answer to the problem is 1/4. 

When we are not asked for the probability of all regions, it is often possible to 
take shortcuts. That is the case here. From P((AU B)*°) = 3/8 we have P(AU B) = 
1-3/8 = 5/8. Since P(AU B) = P(A)+ P(B) — P(AN B) and three of the four terms 
in this equation are known, we can easily solve for P(AN B). 


This is another Venn diagram problem. This time we’ll work with number of people 
instead of probabilities. Let C correspond to the set of computer science majors, W 
the set of women and S to the entire student body. We are given 


iC] = 20% x 5,000 = 1,000 
|W| = 58% x 5,000 = 2, 900 
IC AW] = 430. 


(a) We want |W C*|, which equals |W| — |W C| = 2,470. You should be able to 
see why this is so by the Venn diagram or by the method used in the previous 
problem. 


(b) The number of men who are computer science majors is the number of computer 
science majors who are not women. This is |C|—|CNW]| = 1,000—430 = 570. The 
number of men in the student body is 42% x 5,000 = 2,100. Thus 2,100 — 570 = 
1,530 men are not computer science majors. 


(c) The probability is 455 = 0.086. 


(c) Since there are 58% x 5,000 = 2,900 women, the probability is a. 


Since the coin is fair P(H) = 1/2, what about P(W), the probability that Beatlebomb 
wins? Recall the meaning of the English phrase “the odds that it will occur.” This is 
trivial but important, as the phrase is used often in everyday applications of probability. 
If you don’t recall the meaning, see the discussion of odds in the text. From the 
definition of odds, you should be able to show that P(W) = 1/101. If we had studied 
“independent” events, you could immediately see that the answer to the question is 
(1/2) x (1/101) = 1/202, but we need a different approach which lets independent 
events sneak in through the back door. 

Let the sample space be {H,T} x {W, L}, corresponding to the outcome of the coin 
toss and the outcome of the race. From the previous paragraph P({(H,W), (T,W)}) = 
1/101. Since the coin is fair and the coin toss doesn’t influence the race, we should 
have P((H,W)) = P((T,W)). Since 


P({(H,W), (1,W)}) = P(H,W)) + P(Z,W)), 
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It follows after a little algebra that P(H,W)) = 1/202. 


CL-4.12 This is another example of the hypergeometric probability. Do you see why? The 
answer is C'(37, 11)C(2, 2)/C(89, 13). 


CL-4.13 It may seem at first that you need to break up the problem according to what the other 
players have been dealt. Not so! You should be able to see that the results would have 
been the same if you had been dealt your fifth card before the other players had been 
dealt their cards. Now it’s not hard to work things out. After you’ve been dealt 4 
cards, there are 48 cards left. Of those, the fourth card in the 3 of a kind (4¢ in the 
example) and any of the 3 cards with the same value as your odd card (109 10 10& 
in the example) improve your hand. That’s 4 cards out of 48, so the probability is 
4/48 = 1/12. 


CL-4.14 (a) Let words of length 6 formed from three G’s and three B’s stand for the arrange- 
ments in terms of Boys and Girls; for example, BBGGBG or BBBGGG. There are 
(3) = 6!/(3!3!) = 20 such words. Four such words correspond to the three girls to- 
gether: GGGBBB, BGGGBB, BBGGGB, BBBGGG. The probability of three girls 


being together is 4/20 = 1/5. 


(b) If they are then seated around a circular table, there are two additional arrangements 
that will result in all three girls sitting together: GGBBBG and GBBBGG. The prob- 
ability is 6/20 = 3/10. 


CL-4.15 You can draw the Venn diagram for three sets and, for each of the eight regions, count 
how much a point in the region contributes to the addition and subtraction. This does 
not extend to the general case. We give another proof that does. 

Let S be the sample space and let JT’ be a subset of S Define the function yp with 
domain S' by 


oe 1 ifseT, 
MEM) ifee T. 


This is called the characteristic function of T.1 We leave it to you to check that 


xre(s)=1—xr(s), xrnu(s) = xr(s)xu(s), and P(S) = S” P(s)xr(s). 
ses 


! y is a lower case Greek letter and is pronounced like the “ki” in “kind.” 
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Using these equations and a little algebra, we have 


P(A°N B°NC*) = ¥° P(s)x.aenBence(s) 


ses 
= )> P(s)(1— xa(s)) (1 — xa(s))(1— xe(s)) 
ses 
= 5 P() So Plorals) To Psyxal) -T Phe 
ses ses ses ses 
+ S © P(s)x4(s)x3(s) + S” P(s)x.4(s)xo(s) 
ses ses 
+5S°P(s )xB(s)xa(s) -~S>P(s )xa(s)xB(s)xo(s) 
ses ses 


= 1- P(A) - P(B) - P(C) 
+ P(AN B)+P(ANC) 
+P(BNC)—-P(ANBNC). 


CL-4.16 Let the stick have unit length and let x be the distance from the end of the stick where 
the break is made. Thus 0 < x < 1. The longer piece will be at least twice the length 
of the shorter if « < 1/3 or if x > 2/3. The probability of this is 1/3 +1/3 = 2/3. You 
should be able to fill in the details. 


CL-4.17 Let x and y be the places where the stick is broken. Thus, (, y) is chosen uniformly at 
random in the square S = (0,1) x (0,1). Three pieces form a triangle if the sum of the 
lengths of any two is always greater than the length of the third. We must determine 
which regions in S satisfy this condition. 

Suppose x < y. The lengths are then x, y — x, and 1 — y. The conditions are 


gt+(y-2)>1l-y, «£+(1-y)>y-2z, and (y—az)+(l-y)>z. 
With a little algebra, these become 
gel f{2. yee l2,. and) ¢<-1/2, 


respectively. If you draw a picture, you will see that this is a triangle of area 1/8. 
If x > y, we obtain the same results with the roles of x and y reversed. Thus the 
total area is 1/8 + 1/8 = 1/4. Since S has area 1, the probability is 1/4. 


CL-4.18 Look where the center of the coin lands. If it is within d/2 of a lattice point, it 
covers the lattice point. Thus, there is a circle of diameter d about each lattice point 
and the coin covers a lattice point if and only if it lands in one of the circles. We 
need to compute the fraction of the plane covered by these circles. Since the pattern 
repeats in a regular fashion, all we need to do is calculate the fraction of the square 
{(z,y)|0 << « < 1,0 < y < 1} that contains parts of circles. There is a quarter circle 
about each of the points (0,0), (0,1), (1,0) and (1,1) inside the square. Since the circle 
has diameter at most 1, the quarter circles have no area in common and so their total 
area equals the area of the coin, td?/4. Since the area of the square is 1, the probability 
that the coin covers a lattice point is md? /4. 
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CL-4.19 Select the three points uniformly at random from the circumference of the circle and 
label them 1, 2, 3 going clockwise around the circle from the top of the circle. Let Fy 
denote the event consisting of all such configurations where points 2 and 3 lie in the 
half circle starting at 1 and going clockwise (180 degrees). Let E denote the event 
that points 2 and 1 lie in the half circle starting at 2 and going clockwise 180 degrees. 
Let £3 be defined similarly. Note that the events E,, F2, and E3 are mutually ex- 
clusive. (Draw a picture and think about this.) By our basic probability axioms, the 
probability of the union is the sum of the probabilities P(E) + P(E2) + P(E3). To 
compute P(E), imagine point 1 on the circle, consider its associated half circle and, 
before looking at the other two points, ask “What is the probability that they lie in 
that half circle?” Let x be the number of degrees clockwise from point 1 to point 2 and 
y the number from 1 to 3. Thus (2,y) is a point chosen uniformly at random in the 
square [0,360) x [0,360). For event E; to occur, (x,y) must lie in [0,180) x [0, 180), 
which is 1/4 of the original square. Thus P(£)) = 1/4. (This can also be done using 
independent events: the locations of points 2 and 3 are chosen independently so one 
gets (1/2) x (1/2).) The probabilities of E2 and E3 are the same for the same reason. 

What is the probability that k points selected uniformly at random on the cir- 
cumference of a circle lie in the same semicircle? Use the same method. The answer 
is k/(2*-'), 


215 


Solutions for Functions 


Fn-1.1 


Fn-1.2 


Fn-1.3 


Solutions for Functions 


(a) We know the domain and range of f. f is not an injection. Since no order is given 
for the domain, the attempt to specify f in one-line notation is meaningless (the ASCII 
order +,<,>,?, is a possibility, but is unusual enough in this context that explicitly 
specifying it would be essential). If the attempt at specification makes any sense, it 
tells us that f is a surjection. We cannot give it in two-line form since we don’t know 
the function. 


(b) We know the domain and range of f and the domain has an implicit order. Thus 
the one-line notation specifies f. It is an injection but not a surjection. In two-line 
2 3 


f it 1 ‘ 
OMNI |g. ak 


(c) This function is specified and is an injection. In one-line notation it would be 


(4,3,2), and, in two-line notation, a: : 

4 3 2 
(a) If f is an injection, then |A| < |B|. Solution: Since f is an injection, every element 
of A maps to a different element of B. Thus B must have at least as many elements 
as A. 


(b) If f is a surjection, then |A| > |B|. Solution: Since f is a surjection, every element 
of B is the image of at least one element of A. Thus A must have at least as many 
elements as B. 


(c) If f is a bijection, then |A| = |B]. Solution: Combine the two previous results. 


(d) If |A| = |B|, then f is an injection if and only if it is a surjection. Solution: 
Suppose that f is an injection and not a surjection. Then there is some b € B which is 
not the image of any element of A under f. Hence f is an injection from A to B— {bd}. 
By (a), |A| < |B — {b}| < |B], contradicting |A| = |B]. 

Now suppose that f is a surjection and not an injection. Then there are a,a’ € A such 
that f(a) = f(a’). Consider the function f with domain restricted to A — {a’}. It is 
still a surjection to B and so by (b) |B| < |A — {a’}| < |A| , contradicting | A] = |B}. 


(e) If |A| = |B], then f is a bijection if and only if it is an injection or it is a surjection. 
Solution: By the previous part, if f is either an injection or a surjection, then it is 
both, which is the definition of a bijection. 


(a) Since ID numbers are unique and every student has one, this is a bijection. 


(b) This is a function since each student is born exactly once. It is not a surjection 
since D includes dates that could not possibly be the birthday of any student; e.g., it 
includes yesterday’s date. It is not an injection. Why? You may very well know of two 
people with the same birthday. If you don’t, consider this. Most entering freshman 
are between 18 and 19 years of age. Consider the set F’ of those freshman and their 
possible birth dates. The maximum number of possible birth dates is 366 + 365, which 
is smaller than the size of the set F. Thus, when we look at the function on F' it is 
not injective. 


(c) This is not a function. It is not defined for some dates because no student was 
born on that date. For example, D includes yesterday’s date 
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(d) This is not a function because there are students whose GPAs are outside the range 
2.0 to 3.5. (We cannot prove this without student record information, but we can be 
sure it is true.) 


(e) We cannot prove that it is a function without gaining access to student records; 
however, we can be sure that it is a function since we can be sure that each of the 
16 GPAs between 2.0 and 3.5 will have been obtained by many students. It is not 
a surjection since the codomain is larger than the domain. It is an injection since a 
student has only one GPA. 


{(1,a), (2,6), (3,c)} is not a relation because c ¢ B. The others are relations. 
Among the relations, {(1,a), (2,6), (1,d)} is not a functional relation because the 
value of the function at 3 is not defined and {(1,a), (2,6) (3,d), (1,0)} is not a 
function because the value of the function at 1 is not uniquely defined. Thus only 
{(3,a), (2,6), (1,a)} is a functional relation. 

Only the inverse of {(1,a@), (2,6), (1,d)} is a functional relation. We omit the expla- 
nation. 


(a) For (1,5,7,8) (28) (4) (6): (7 5 3 47 6 7S 


(5,3,2,4,7,6,8,1) is the one-line form. (We’ll omit the two-line form in the future 
since it is simply the one-line form with 1,2,... placed above it.) The inverse is 
(1,8,7,5) (2,3) (4) (6) in cycle form and (8,3,2,4,1,6,5,7) in one-line form. 


(b) For G ; : ; : : : al The cycle form is (1,8) (2,3,7,5,6,4). Inverse: 


cycle form is (1,8) (2,4,6,5,7,3); one-line form is (8,4,2,6,7,5,3,1). 


(c) For (5,4,3,2,1), which is in one-line form: The cycle form is (1,5) (2,4) (3). The 
permutation is its own inverse. 


: : is the two-line form and 


(d) (5,4,3,2,1), which is in cycle form: This is not the standard form for cycle form. 
Standard form is (1,5,4,3,2). The one-line form is (5,1,2,3,4). The inverse is (1,2,3,4,5) 
in cycle form and (2,3,4,5,1) in one-line form. 


Write one entire set of interchanges as a permutation in cycle form. The interchanges 
can be written as (1,3), (1,4) and (2,3). Thus the entire set gives 1 > 3 > 2, 2 > 3, 
341 4and 4-1. Incycle form this is (1,2,3,4). Thus five applications takes 1 to 
2: 


(a) Imagine writing the permutation in cycle form. Look at the cycle containing 1, 
starting with 1. There are n — 1 choices for the second element of the cycle AND 
then n — 2 choices for the third element AND --- AND (n—k&+1) choices for the kth 
element. Prove that the number of permutations in which the cycle generated by 1 
has length n is (n — 1)!: The answer is given by the Rule of Product and the above 
result with k =n. 


(b) For how many permutations does the cycle generated by 1 have length k? We 
write the cycle containing 1 in cycle form as above AND then permute the remaining 
n —k elements of n in any fashion. For the & long cycle containing 1, the above result 

"s (n—1)! . ! F A * 
gives GET choices. There are (n — k)! permutations on a set of size n — k. Putting 
this all together using the Rule of Product, we get (n — 1)!, a result which does not 
depend on k. 
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(c) Since 1 must belong to some cycle and the possible cycle lengths are 1,2,...,n, 
summing the answer to (b) over 1 < k < n will count all permutations of n exactly 
once. In our case, the sum is (n— 1)!+---+(n—-1)!=n~x (n—-1)!=nl!. 


This problem has shown that if you pick a random element in a permutation of an 
n-set, then the length of the cycle it belongs to is equally likely to be any of the values 
from 1 to n. 


Let e be the identity permutation of A. Since eo f = f for any permutation of A, we 
have eoe =e. Applying this many times e* = eoeo0---oe =e for any k > 0. We will 
use this in discussing the solution. 


(a) We can step around the cycle as in Example 8 and see that after 3 steps we are back 
where we started from. Three hundred steps simply does this one hundred times. 
Instead of phrasing it this way, we could say (1,2,3)? = e and so (1,2,3)3°° = 
(i 23) = e100 —e. 


(b) Since we step around each cycle separately, 


(GIGS) SE. Opa mS en Se 


(c) A permutation of a k-set cannot have a cycle longer than k. Thus the possi- 
ble cycle lengths for permutations of 5 are 1, 2, 3, 4 and 5. A cycle of any of 
these lengths raised to the 60th power is the identity. For example (a, b,c, d)®° = 
(Gy b6.d)9)"? =e Se. Ths f= eo Finally fo" = fp a= er =f: 


(a) The domain and range of f are specified and f takes on exactly two distinct values. 
f is not an injection. Since we don’t know the values f takes, f is not completely 
specified; however, it cannot be a surjection because it would have to take on all four 
values in its range. 


(b) Since each block in the coimage has just one element, f is an injection. Since 
|Coimage(f)| = 5 = |range of f|, f is a surjection. Thus f is a bijection and, since the 
range and domain are the same, f is a permutation. In spite of all this, we don’t know 
the function; for example, we don’t know f(1), but only that it differs from all other 
values of f. 


(c) We know the domain and range of f. From f~!(2) and f~+(4), we can determine 
the values f takes on the union f~1(2) U f~1(4) = 5. Thus we know f completely. It 
is neither a surjection nor an injection. 


(d) This function is a surjection, cannot be an injection and has no values specified. 


(e) This specification is nonsense. Since the image is a subset of the range, it cannot 
have more than four elements. 


(f) This specification is nonsense. The number of blocks in the coimage of f equals 
the number of elements in the image of f, which cannot exceed four. 


(a) The coimage of a function is a partition of the domain with one block for each 
element of Image(f). 


(b) You can argue this directly or apply the previous result. In the latter case, note 
that since Coimage(f) is a partition of A, |Coimage(f)| = |A| if and only if each block 
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of Coimage(f) contains just one element. On the other hand, f is an injection if and 
only if no two elements of A belong to the same block of Coimage(f). 


(c) By the first part, this says that |Image| = |B|. Since Image(f) is a subset of B, it 
must equal B. 


(a) The list is 321, 421, 431, 432, 521, 531, 532,541, 542, 543. 


(b) The first number is (oe) + Ce) + ) +1l= (3) + (5) + (?) +1=1. The 
last number is (3) + (3) + (7) +1= 10. The numbers (Pos) + (a) + eas +1 
are, consecutively, 1,2,...10 and represent the positions of the corresponding strings 


£12223 in the list. 
(c) The list is 123, 124, 125, 134, 135, 145, 234, 245, 345. 


(d) If, starting with the list of (c), you form the list (6 — x,)(6 — x2)(6 — x3), you get 
543, 542,541, 532,531,521, 432, 431, 421,321 which is the list of (a) in reverse order. 
Thus the formula of (b) gives the positions p(a#1, #2, 23) in reverse order of the list (c). 
Subtract 11 — p(#1, 22,23) to get the position in forward order. 


(e) Successor: 98421. Predecessor: 97654. 


(f) Let x1 = 9, rg = 8, x3 = 3, a4 = 2 and x5 = 1. Using the idea in part (b) of this 
exercise, the answer is 


Cee ewes uD 
G)+@+@)+G)+Q) 


= 064+35+0+0+0 = 91. 


(a) The first distribution of balls to boxes corresponds to the strictly decreasing string 
863. The next such string in lex order on all strictly decreasing strings of length 3 
from 8 is 864. To get the corresponding distribution, place the three moveable box 
boundaries under positions 8, 6, and 4 and put balls under all other positions in 8. The 
predecessor to 863 is 862. The second distribution corresponds to 542. Its successor is 
543, its predecessor is 541. 

(b) The formula p(x7,72,73) = (Ae) + ("5") + (7) +1 gives the position of 
the string 712273 in the list of decreasing strings of length three from 8. We solve 


the equation p(x1, 22,23) = (5) /2 = 28 for the variables x1, 22,23. Equivalently, find 
1, 22,23 such that (ae) + eee) + (a) = 27. First try to choose x; —1 as large as 
possible so that Ger) < 27. A little checking gives x;—1 = 6, with (eee) = @) = 20. 
Subtracting, 27 — 20 = 7. Now choose x2 — 1 as large as possible so that (a) <7. 


This gives x2 — 1 = 4 with (es5) = () = 6. Now subtract 7 — 6 = 1 and choose 
x3 —-1=1. Thus, (1,272,273) = (7,5,2). The first element in the second half of the 
list is the next one in lex order after 752 which is 753. The corresponding distributions 


of ball into boxes can be obtained in the usual way. 


(a) 2,2,3,3 is not a restricted growth (RG) function because it doesn’t start with 1. 
1, 2,3,3,2,1 is a restricted growth function. It starts with 1 and the first occurrence 
of each integer is exactly one greater than the maximum of all previous integers. 
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1,1,1,3,3 is not an RG function. The first occurrence of 3 is two greater than the max 
of all previous integers. 
1,2,3,1 is an RG function. 


(b) We list the blocks f~1(2) in order of i. Observe that all partitions of 4 occur exactly 
once as coimages of the RG functions. 


iii 413.4-4) 1419 -¥ {12,3} 44). T4719 a3} 
1199-28 Ay Ss 1h 43) ea 1 4 13,4} 
DIS SIBOAY: UDI Shot AY Woon cay fo, ah 
120941 SOB AL 1003S h SO. 3) Ay ost ett al toy cat 
1232 + {1}, {2,4}, {3} 1233 — {1}, {2}, {3,4} 1234 —- {1}, {2}, {3}, {4} 


(TUT, ATLID) 11191, 41102, 11193-1188 4a {5h} 
O11. 1191: ATS, 11997 10900 Ss TO 3 A Bh 
11223, 11231, 11232, 11233, 11234 > {{1, 2}, {3}, {4}, {5}} 


S(6,3)(5)3 =90 x 5x 4x 3 = 5400. 


The set B of balls is the domain and the set C of cartons is the range. Every function 
in C? describes a different way to put balls into cartons. Since 2 cartons are to remain 
empty, we are interested in functions f with |Image(f)| = 3. Thus the answer to this 
exercise is exactly the same as for the previous exercise. 


— 
lo) 
~~" 


By the theorem in the text and Example 14, these are all the same. By the method in 


Example 14, the answer is Eas) = (8) -_ (3) = 84. 


hae. Oe. oh BS 
0 1/16 0 0 0 
1 0 4/16 O 0 4/16 The row index is X and 
2 0 3/16 3/16 O 6/16 the column index is Y. 
3 0 0 2/16 2/16 0 4/16 
4 0 OO O © 1/16 1/16 
fy 1/16 7/16 5/16 2/16 1/16 

E(X) =2, Var(X)=ox=1 E(Y) = 1.69, Var(Y) = 0.96, oy = 0.98 

Cov(X, Y) = 0.87 


p(X, Y) = 0.87/(1)(0.98) = +0.89 Since the correlation is close to 1, X and Y move 
up and down together. In fact, you can see from the table for the joint distribution 
that X and Y are often equal. 


4 
0 
0 
0 


(a) You should be able to supply reasons for each of the following steps 
Cov(aX + bY, aX — bY) = El(aX 4+ bY) (aX — bY)] — El(aX 4+ bY)|E|(axX — bY )] 

= Bla? X? — BY?| — [aB(X) — bE(Y)][aB(X) + bE(Y)] 
= Bla? xX? — bY] — [a?(E(X))? — 0(E(Y))"] 
= o[B(X*) — (B(X)))] — PLB(Y?) - (E()"} 
= a’ Var(X) — b’Var(Y) 

Alternatively, using the bilinear and symmetric properties of Cov: 

Cov(aX + bY, aX — bY) = a?Cov(X, X) — abCov(X,Y) + baCov(Y, X) — b’Cov(Y, Y) 

= a?Var(X) — b’Var(Y) 
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Here is the calculation: 


Var[(aX + bY)(aX — bY)] = Var[a?.X? — b’Y?)| 
= a“Var(X?) — 2a7b*Cov(X?, Y”) + b*Var(Y”) 


We begin our calculations with no assumptions about the distribution for X. Expand 
the argument of the expectation and then use linearity of expectation to obtain. 


E((aX + b)?) = E(a?X? + 2abX 4+ 67)) = a? E(X”) + 2abE(X) + 0’. 


(The last term comes from the fact that E(b?) = 6? since b? is a constant.) By 
definition, Var(X) + (E(X))? = E(X?). Thus 


E((aX +)2) =a (Var(xX) as (E(X))”) 4 2abE(X) +b. 
With a little algebra this becomes, 
E((aX + b)?) = a? Var(X) + (aE(X) +). 


Specializing to the particular distributions for parts (a) and (b), we have the following. 
(a) E((aX + b)*) = a*np(1 — p) + (anp + b)?. 
(b) E((aX + b)?) =a? +4+ (ad 4+ B)?. 
We make the dubious assumption that the misprints are independent of one another. 
(This would not be the case if the person preparing the book was more careless at 
some times than at others.) 

Focus your attention on page 8. Go one by one through the misprints m,, 
mg, ..., M09 asking the question, “Is misprint m; on page 8?” 

By the assumptions of the problem, the probability that the answer is “yes” for 


each m,; is 1/100. Thus, we are dealing with the binomial distribution b(k; 200, 1/100). 
The probability of there being less than four misprints on page 8 is 


3 3 
S © 6(k; 200, 1/100) = S° ) (1/100)*(99/100)20°-*. 


k=0 k=0 


Using a calculator, we find the sum to be 0.858034. 
Using the Poisson approximation, we set \ = np = 2 and compute the easier sum 


e 72° /0! + e721 /1! + e 72/2! +e 727/31, 


which is 0.857123 according to our calculator. 


From the definition of Z and the independence of X and Y, Tchebycheff’s inequality 
states that 


P(\Z— aB(X) ~ bE(y)| > 6) < OE Van 


Applying this to the two parts (a) and (b), we get 
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Fn-4.6 


Fn-4.7 


a*y + 676 


(a) PZ ay — bs] 6) < 


€ 
a?nr(1—r) + b?ns(1— s) 


(b) P(|Z — anr — bns >€) < 5 


€ 


We are dealing with b(k;1000,1/10). The mean is np = 100 and the variance is 
npq = 90. The standard deviation is thus, 9.49. The exact solution is 


~ <= (1000 
> b(k; 1000, 1/10) = s ( : ) ca/roytea/10)3—*, 
k=85 k=85 


Using a computer with multi-precision arithmetic, the exact answer is 0.898. To apply 
the normal distribution, we would compute the probability of the event [100,115] 
using the normal distribution with mean 100 and standard deviation 9.49. In terms of 
the standard normal distribution, we compute the probability of the event [0, (115 — 
100)/9.49] = [0, 1.6] (rounded off). If you have access to values for areas under the 
standard normal distribution, you can find that the probability is 0.445. We double 
this to get the approximate answer: 0.89. 


We have 


E(X) = E((1/n)(X1 +--+ + Xn)) = A /n)E(X1 +--+ + Xn) 
= (1/n)(E(X1) +--+ + E(Xn)) = (l/r) (ut +e) = 


Var(X) = Var((1/n)(X1 +--+» + Xn)) = (1/n)? Var(X1 +--+ + Xn) 
= (1/n)?(Var(X1) +--+ + Var(X,)) = (1/n)?(no*) = 0? /n. 
Since X has mean yp, it is a reasonable approximation to uw. Of course, it’s important 
to know something about the accuracy. 


(c) Since Var(X)= 07/n, we have ox =a/\/n. If we change from n to N, ox changes 
to a/ VN. Since we want to improve accuracy by a factor of 10, we want to have 
o/VN = (1/10)(o/\/n). After some algebra, this gives us N = 100n. In other words 
we need to do 100 times as many measurements! 
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PREV: C, CC, CCV, CCVC, CCVCC, CCCVCV, CV, CVC, CVCC, CVCCV, CVCV, 
CVCVC, V, VC, VCC, VCCV, VCCVC, VCV, VCVC, VCVCC, VCVCV. 

POSV: CCVCC, CCVCV, CCVC, CCV, CC, CVCCV, CVCC, CVCVC, CVCV, 
CVC, CV, C, VCCVC, VCCV, VCC, VCVCC, VCVCV, VCVC, VCV, VC, V. 

BFV: GC, V, CC, CV, VC, CCV, CVC, VCC, VCV, CCVC, CVCcc, CVCV, VCCV, 
VCVC, CCVCC, CCVCV, CVCCV, CVCVC, VCCVC, VCVCC, VCVCV. 


You will need the decision trees for lex and insertion order for permutations of 3 and 
4. The text gives the tree for insertion order for 4, from which the tree for 3 can be 
found — just stop one level above the leaves of 4. You should construct the tree for 
lex order. 


(a) To answer this, compare the leaves. For n = 3, permutations 0 = 123, 132, and 
321 have RANK; (c) = RANK;(o). For n = 4 the permutations 0 = 1234, 1243, 
and 4321 have RANK; (c) = RANK;(¢). 


(b) From the tree for (a), RANK, (2314) = 8. 

Rather than draw the large tree for 5, we use a smarter approach to compute 
RANK (45321) = 95. To see the latter, Note that all permutations on 5 that 
start with 1, 2, or 3 come before 45321. There are 3 x 24 = 72 of those. This leads 
us to the subtree for permutations of {1,2,3,5} in lex order. It looks just like the 
decision tree for 4 with 4 replaced by 5. (Why is this?) Since RANK, (4321) = 23, 
this makes a total of 72 + 23 = 95 permutations that come before 45321 and so 
RANK _ (45321) = 95. If you find this unclear, you should try to draw a picture 
to help you understand it. 


(c) RANK;(2314) = 16. What about RANK (45321)? First do 1, then 2, and so on. 
After we have done all but 5, we are at the rightmost leaf of the tree for 4. It has 
23 leaves to the left of it. When we insert 5, each of these leaves is replaced by 
5 new leaves because there are 5 places to insert 5. This gives us 5 x 23 = 115 
leaves. Finally, of the 5 places we could insert 5 into 4321, we chose the 4th so 
there are 3 additional leaves to the left of it. Thus the rank is 115 = 3 = 118. 


(d) RANK, (3241) = 15. 
(ec) RANK; (4213) = 15. 


(f) The first 24 permutations on 5 consist of 1 followed by a permutation on {2,3, 4, 5}. 
Since our goal is the permutation of rank 15, it is in this set. By (d), RANK, of 
3241 is 15 for n = 4. Thus RANK, (4352) = 15 in the lex list of permutations on 
{2,3,4, 5}. 
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DT-1.3 Here is the tree 


A B 
B A B 
A B B A 
B A A B B 
A B B B A A B 
B A A B aA/ \B B B A 


The list in lex order: 


ABABAB_ ABABBA, ABBABA, ABBABB BABABA BABABB BABBAB BBABAB- BBABBA 


DT-1.4 Here is a decision tree for D(6*). The leaves correspond to the elements of D(6*) in 
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lex order, obtained by reading the sequence of vertex labels from the root to the leaf. 


4 5 6 
3 3 4 3 ri 5 
4 
2 2 2 3 2 2 3 2 3 
1 1 1 1 2 1 1 1 2 1 1 2 1 2 38 
(a) The rank of 5431 is 3. The rank of 6531 is 10. 
( 


(c) The first 5 leaves correspond to D(5*). 


(d) D(6*) is bijectively equivalent to the set, P(6,4), of all subsets of 6 of size 4. 
Under this bijection, an element such as 5431 € D(64) corresponds to the set 


{1,3, 4, 5}. 
DT-1.5 For PREV and POSV, omit Step 2. For PREV, begin Step 3 with the sentence 


) 

b) 4321 has rank 0 and 6431 has rank 7. 
) 
) 


“Tf you have not used any edges leading out from the vertex, list the vertex.” 


For POSV, change Step 3 to 


“Tf there are no unused edges leading out from the vertex, list the vertex 
and go to Step 4; otherwise, go to Step 5.” 


DT-1.6 The problem is that the eight hibachi grills, though different as domino coverings, are 
all equivalent or “isomorphic” once they are made into grills. All eight in the first row 
below can be gotten by rotating and/or turning over the first grill. 


SO ea Oo a ee eS 


vvhvvhhh vvhhvhvh hhvvhvvh vhvhhvvh hvvhvhvh hvvvvhhh vhvhvvhh hhhvvvvh 


(1) (2) (3) (4) (5) (6) (7) (8) (9) 
225 


Solutions for Decision Trees and Recursion 


DT-2.1 


DT-2.2 


DT-2.3 


There are nine different grills as shown in the picture. These nine might be called a 
“representative system” for the domino coverings up to “grill equivalence.” Note that 
these nine representatives are listed in lex order according to their codes (starting with 
hhhhhhhh and ending with hvvhvvhh). They each have another interesting property: 
each one is lexicographically minimal among all patterns equivalent to it. The one we 
selected from the list of “screwup” grills (number (6)) has code hhhvvvvh and that is 
minimal among all codes on the first row of coverings. 

This problem is representative of an important class of problems called “isomorph 
rejection problems.” The technique we have illustrated, selecting a lex minimal system 
of representatives up to some sort of equivalence relation, is an important technique 
in this subject. 


We refer to the decision tree in Example 10. The permutation 87612345 specifies, by 
edge labels, a path from the root L(8) to a leaf in the decision tree. To compute the 
rank, we must compute the number of leaves “abandoned” by each edge just as was 
done in Example 14. There are eight edges in the path with the number of abandoned 
leaves equal to 7x 7!+6 x6!+5 x 5!+-0+0+0+0+0 = 35, 280+4, 320+600 = 40, 200. This 
is the RANK of 87612345 in the lex list of permutations on 8. Note that 8! = 40, 320, so 
the RANK 20,160 permutation is the first one of the second half of the list: 51234678. 


(a) The corresponding path in the decision tree is H(8, S, E, G), H(7, E, 8, G), 
H(6, S, E, G), H(5, S$, G, E), H(4, S, E, G), H(3, E, $8, G), E3G. 


(b) The move that produced the configuration of (a) was E +, G. The configuration 
prior to that was Pole S: 6, 5, 2, 1; Pole E: 3; Pole G: 8, 7, 4. 


(c) The move just prior to E ~, G was G > S. This is seen from the decision tree 
structure or from the fact that the smallest washer, number 1, moves every other time 


in the pattern S, E, G, S, E, G, etc. The configuration just prior to the move G 48 
was Pole S: 6, 5, 2; Pole E: 3; Pole G: 8, 7, 4, 1. 


(d) The next move after E -, G will be another move by washer 1 in its tiresome cycle 
S_E, G, S, E, G, etc. That will be S 5 E. 


(e) The RANK of the move that produced (a) can be computed by summing the 
abandoned leaves associated with each edge of the path (a) in the decision tree. (See 
Example 14.) There are six edges in the path of part (a) with associated abandoned 
leaves being 27 — 128, 26 = 64, 0, 0, 23 = 8, 22 —1=3. The total is 203. 


(a) 110010000 is preceded by 110010001 and is followed by 110110000. You can find 
this by first drawing the path from the root to 110110000. You will find that the last 
edge of the path goes to the right. Therefore, we can get the preceding element by 
going to the left instead. This changes the last element from 0 to 1 and all other 
elements remain fixed. To get the element that follows it, we want to branch to the 
right instead of the left. The last five edges to 110110000 all go to the right and the 
edge just before them, say e goes to the left. Instead of taking e, we take the edge that 
goes to the right. Now what? We must take edges to the left after this so that we end 
up as close to the original leaf 110010000 as possible. A trick: Since we are dealing 
with a Gray code, we know that there is only one change so that when we’ve found 
it we can just copy everything else. In this case we changed the underlined symbol in 
110010000 (from 0 to 1) and so the others are the same. 
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(b) The first element of the second half of the list corresponds to a path in the decision 
tree that starts with a right-sloping edge and has all of the remaining eight edges 
left-sloping. That element is 110000000. 


(c) Each right-sloping edge abandons 2”~* leaves, if the edge is the k'® one in the 
path. For the path 111111111 the right-sloping edges are numbers 1, 3, 5, 7, and 9 
(remember, after the first edge, a label 1 causes the direction of the path to change). 
Thus, the rank of 111111111 is 28 + 2° + 24 + 2? + 29 = 341. 


(d) To compute the element of RANK 372, we first compute the path in the decision 
tree that corresponds to the element. The first edge must be (1) right sloping (aban- 
doning 256 leaves), since the largest rank of any leaf at the end of a path that starts left 
sloping is 28 — 1 = 255. We apply this same reasoning recursively. The right sloping 
edge leads to 256 leaves. We wish to find the leaf of RANK 372 — 256 = 116 in that 
list of 256 leaves. That means the second edge must be (1) left sloping (abandoning 
0 leaves), so our path starts off (1) right sloping, (1) left sloping. This path can 
access 128 leaves. We want the leaf of RANK 116 — 0 in this list. Thus we must 
access a leaf in the second half of the list of 128, so the third edge must be (1) right 
sloping (abandoning 64 leaves). In that second half we must find the leaf of RANK 
116 — 64 = 52. 

Our path is now (1) right sloping, (1) left sloping, (1) right sloping. Fol- 
lowing that path leads to 64 leaves of which we want the leaf of RANK 52. Thus, 
the fourth edge must be (0) right sloping (abandoning 32 leaves). This path of 
four edges leads to 32 leaves of which we must find the one of RANK 52 — 32 = 20. 
Thus the fifth edge must also be (0) right sloping (abandoning 16 leaves). Thus we 
must find the leaf of RANK 20 — 16 = 4. This means that the sixth edge must be 
(1) left sloping (abandoning 0 leaves), the seventh edge must be (1) right sloping 
(abandoning 4 leaves), and the last two edges must be left sloping: (1) left sloping 
(abandoning 0 leaves), (0) left sloping (abandoning 0 leaves). Thus the final path is 
111001110. 


(a) Let A(n) be the assertion “H(n,S,E,G) takes the least number of moves.” Clearly 
A(1) is true since only one move is required. We now prove A(n). Note that to do 
S + G we must first move all the other washers to pole E. They can be stacked only 
one way on pole EF, so moving the washers from S to EF requires using a solution to 
the Towers of Hanoi problem for n — 1 washers. By A(n— 1), this is done in the least 
number of moves by H(n — 1,8,G,E). Similarly, H(n — 1,E,S,G) moves these washers 
to G in the least number of moves. 


(b) Forse 1, frat: g4¢6 
rv) P=s SSE 64-4 BSG 
Porn =3, f=h: SSE, S4r,S36>F3¢,8E4¢ 


(c) Let s(p,q) be the number of moves for G(p, q, S, E, F, G). The recursive step in the 
problem is described for p > 0, so the simplest case is p = 0 and s(0,q) = h(q) = 22-1. 
In that case, (i) tells us what to do. 


Otherwise, the number of moves in (ii) is s(p,q) = 2s(i,j) + hg. To find the 
minimum, we look at all allowed values of 7 and j, choose those for which s(i,7) is a 
minimum. This choice of i and j, when used in (ii) tells us which moves to make. In 
the following table, numbers on the rows refer to p and those on the columns refer to q. 
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Except for the s, column, then entries are s(p,q). The p = 0 row is hg by (i). To find 
s(p,q) for p > 0, we use (ii). To do this, we look along the diagonal whose indices sum 
to p, choose the minimum (It’s location is (,7).), double it and add h,. For example, 
s(5,2) is found by taking the minimum of the diagonal entries at (0,5), (1,4), (2,3), 
(3,2), and (4,1). Since these entries are 31, 17, 13, 13, and 19, the minimum is 13. 
Since this occurs at (2,3) and (3,2), we have a choice for (7,7). Either one gives us 
2x 13+ hg = 29 moves. To compute s,, we simply look along the p+ gq = n diagonal 
and choose the minimum. 


s» | 1 2 3 4 #5 6 _ (values of gq) 


0 15 31 63 (s(0,q) =hg) 
1} 1 3 100 29 Ti cao “65 

2 3 7 9 13 21 27 

3/5 | 11 13 17 25 

4 9 19 21 25 

® | 18.) 2F° 229 

6 | 17 | 35 Column labels are p. 


(d) From the description of the algorithm, 
e s(p,q) = 2min s(i,7) +hg, where the minimum is over i+ 7 = p and 
© sp = mins(p,q), where the minimum is over p+q =n. 


Putting these together gives us s(p,q) = 28) + hg and so s, = min(2s, +h,). The 
initial condition is s) = 0. In summary 


0 ifn =0, 
Sn = jmiin, (8p +hq) ifn>0. 
q>0 


(e) Change the recursive procedure in the algorithm to use the moves for f,, instead of 
using those for s(p,q). It follows that we can solve the puzzle in 2f,_; +h ; moves. 


When there is replacement, the result of the first choice does not matter since the ball 
is placed back in the box. Hence the answer to both parts of (a) is 3/7. 


(b) If the first ball is green, we are drawing a ball from three white and three green 
and so the probability is 3/6 = 1/2. If the first ball is white, we are drawing a ball 
from two white and four green and so the probability is 2/6 = 1/3. 


There are five ways to get a total of six: 1+5,2+4,3+3,4+2, and5+1. All 
five are equally likely and so each outcome has probability 1/5. We get the answers 
by counting the number that satisfy the given conditions and multiplying by 1/5: 


(a) 1/5, (b) 2/5, (c) 3/5. 
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DT-3.3 Here is the decision tree for this problem 


Root 


ae ae ioe EM~R se BO~R HMR ge 
.20 14  .06 


(a) We want to compute the conditional probability that a student is a humanities 
major, given that that student has read Hamlet. In the decision tree, if we follow 
the path from the Root to H to HMR, we get a probability of .06 at the leaf. 
We must divide this by the sum over all probabilities of such paths that end at 
X ™ R (as opposed to XN ~ R). That sum is 0.01 + 0.20 + 0.06 + 0.06 = 0.33. 
The answer is 0.06/0.33 = 0.182. 


(b) We compute the probabilities that a student has not read Hamlet and is a P 
(Physical Science) or E (Engineering) major: 0.09 + 0.20 = 0.29. We must divide 
this by the sum over all probabilities of such paths that end at XN ~ R (as 
opposed to X 9 R). The answer is 0.29/0.67 = 0.433. 


DT-3.4 Here is a decision tree where the vertices are urn compositions. The edges incident 
on the root are labeled with the outcome sets of the die and the probabilities that 
these sets occur. The edges incident on the leaves are labeled with the color of the ball 
drawn and the probability that such a ball is drawn. The leaves are labeled with the 
product of the probabilities on the edges leading from the root to that leaf. 


[1R, 1W] 


{1,2,} {3,4,5,6} 
1/3 2/3 


[2R, 1W] [4R, 1W] 


[1R, 1W] [2R, OW] [3R, 1W] [4R, OW] 
2/9 1/9 8/15 2/15 


(a) To compute the conditional probability that a 1 or 2 appeared, given that a red 
ball was drawn, we take the probability 2/9 that a 1 or 2 appeared and a red 
ball was drawn and divide by the total probability that a red ball was drawn: 
2/9 + 8/15 = 34/45. The answer is 5/17 = 0.294. 
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(b) We divide the probability that a 1 or 2 appeared and the final composition had 
more than one red ball (1/9) by the sum of the probabilities where the final 
composition had more than one red ball : 1/9 + 8/15 + 2/15 = 7/9 = 0.78. 


DT-3.5 A decision tree is shown below. The values of the random variable X are shown just 
below the amount remaining in the pot associated with each leaf. To compute E(X) 
we sum the values of X times the product of the probabilities along the path from the 
root to that value of X. Thus, we get 


E(X) =1x (1/2) +2 x (1/8) + (24+34+34+34+4+5) x (1/16) =2. 


1 
a oe 
0 

2 
1 


1/2 1/2 


3 


1 
/\< 1/2 1/2 


0 2 2 4 
2 
1/2 1/2 1/2 1/2 1/2 1/2 
1 3 1 3 3 5 
2 3 3 3 4 5 


DT-3.6 A decision tree is shown below. Under the leaves is the length of the game (the height 
of the leaf). The expected length of the game is the sum of the products of the 
probabilities on the edges of each path to a leaf times the height of that leaf: 


2((1/3)? + (2/3)?) + 
4((1/3)3(2/3) + (1/3)?(2/3)? + (1/3)?(2/3)? + (1/3)(2/3)9) + 
3((1/3)(2/3)? + (1/3)°(2/3)) . 
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The expected number of games is about 2.69. 


1/3 2/3 
A B 
1/3 2/3 wy ot 
A B A B 
2 2 
1/3 2/3 1/3 2/3 
A B A B 
3 3 

1/3 2/3 1/3 2/3 

A B A B 

4 4 4 4 


DT-3.7 We are given 
P(F’| A)=0.6, P(F|A’)=0.8 and P(A) =0.7. 
You can draw a decision tree. The first level branches according as the air strike is 
successful (A) or not (A’). The probabilities, left to right, are 0.7 and 1 — 0.7 = 0.3. 
The second level branches according as there is enemy fire (F’) or not (F’). To compute 
the conditional probabilities on the edges, note that 
P(F | A)=1-—P(F’| A)=1-06=0.4 and P(F’| A’!)=1-08=0.2. 

The leaves and their probabilities are 


P(ANF)=0.7x04=0.28, P(ANF’) =0.7 x 0.6 = 0.42, 


P(A'N F) =0.3x08=0.24, P(A'N F’) =0.3 x 0.2 = 0.06. 
For (a), P(F") = 0.42 + 0.06 = 0.48 and for (b) 


P(AN F’ 0.42 


DT-4.1 (a) a, = 1 for all n. 
(b) dg = 0, ay =O+ a9 = 0, a2 =1+a, = 1, a3 =1+agq = 2, a4 = 2+ 43 = 4. 
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DT-4.2 
DT-4.3 


DT-4.4 


DT-4.5 


DT-4.6 


DT-4.7 


(c) a9 = 1, a, = 14+ a9 = 2, ag = 24+ a, = 4, a3 = 34+ 0, =5, 04 =44+ 02 = 8. 


(d) a9 = 0, ay = 1, ag = 14+ quay = 2, ag = 14+ min(a1a2,a2a1) = 14+ ayag = 3, 
a4 = 14+ min(a14a3, a2@2, 43,41) = 1+ min(3,4) = 4. 


an = |n/2], bn = (—1)”|1 + (n/2)| = (-1)"(1 + [n/2]), en =n? +1, dp = nl. 


x? — 62 +5 =0 has roots r} = 1 and rg = 5 
x? — ¢ —2=0 has roots r; = —1 and rg = 2 
V5 


x”? — 5x2 —5 =0 has roots ==¥ 


The characteristic equation is x? — 6x + 9 = 0, which factors as (x — 3)? = 0. Thus 
Ty =T2 = 3. We have Ki = ap = 0 and 3K9 =a, =3. Thus a, = n3”". 

Let An = Qn42 80 that Ag = 1, Ay = 3 and A, = 3A,_1 — 2An_2 for n > 2. The 
characteristic equation is ?—32+2 = 0 and has roots r; = 1, rg = 2. Thus Kj+ Ko = 1 
and K,+2K>2 =3andso K, = —1 and Kz = 2. We have A, = —1+2x 2" =27t!_] 
and so ay = Ap ag = OPP 1, 

The characteristic equation is 2? — 2x +1 = (4-1)? = 0. Thus r; = rg = 1 and so 
Ky =a9 =2 and Ki + Ko =a, = 1. We have Ko = 1— Ki = —1 and soa, =2—n. 
(a) Let A(n) be the assertion that G(n) = (1 — A”)/(1— A). When n = 1, G(1) =1 
and (1 — A”)/(1 — A) = 1, so the base case is proved. For n > 1, we have 


CVA Ae eA by definition, 
S(t Aa AP eA Art 
1 ae n—-1 
Saar Oy a by A(n— 1), 
1— A” 
a ea; by algebra. 


(b) The recursion can be found by looking at the definition or by examining the proof 
in (a). It is G(1) = 1 and, forn > 1, G(n) =G(n—-1)4+ A”. 


(c) Applying the theorem is straightforward. The formula equals 1 when n = 1, which 
agrees with G(1). By some simple algebra 


1— A" (1—A™1)+(A™1- A") 1-A™ 


a A 
7 a iA =e 


and so the formula satisfies the recursion. 


(d) Letting A = y/x and cleaning up some fractions 


Pg o)e yee wien 
1—y/x xy ; 


Let n =k +1, multiply by x* and use the geometric series to obtain 
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DT-4.8 We will use Theorem 7 to prove our conjectures are correct. 


DT-4.9 


DT-4.10 


DT-4.11 


(a) Writing out the first few terms gives A, A/(1+ A), A/(1+2A), A/(1+4+ 3A), ete. 
It appears that a, = A/(1+ kA). Since A > 0, the denominators are never zero. 
When k = 0, A/(1+ kA) = A, which satisfies the initial condition. We check the 
recursion: 

A/(1+(k—1)A) A 


Ate DD Cee paaa 


which is the conjectured value for ax. 


(b) Writing out the first few terms gives C, AC+ B, A7C + AB+B, A2C+A?B+ 
AB +B, A*C + A?B+ A?B+ AB+B, etc. Here is one possible formula: 


(eS AC EBA Ae AT, 


Here is a second possibility: 


Using the previous exercise, you can see that they are equal. We leave it to you to 
give a proof of correctness for both formulas, without using the previous exercise. 


We use Theorem 7. The formula gives the correct value for k = 0. The recursion 


checks because 


A+ B(k — 1)(((k - 1)? = 1)/3) + Bh(k- 1) = A+ Blk - parker 


= A+ B(k—1)k(k+1)/3 
= A+ Bk(k? —1)/3. 


This completes the proof. 


(a) We apply Theorem 7, but there is a little complication: The formula starts at 
k = 1, so we cannot check the recursion for k = 1. Thus we need a, to be the initial 
condition. From the recursion, aj = 2A — C, which we take as our initial condition 
and use the recursion for k > 1. You should verify that the formula gives a; correctly 
and that the formula satisfies the recursion when k > 1. 


(b) From the last part of Exercise 4.7 with x = 2 and y = —1, we obtain 
gk+1 S (HI k+1 
GHA a! eC Ae 


Make sure you can do the calculations to derive this. 


Let pz denote the probability that the gambler is ruined if he starts with 0 <k <Q 
dollars. Note that po = 1 and pg = 0. Assume 1 < k < Q. Then the recurrence 
relation pp_1 = (1/2)pz, + (1/2)pp—2 holds. Solving for pz, gives pp = 2pp—1 — Pr_-2- 
This looks familiar. It is a two term linear recurrence relation. But the setup was a 
little strange! We would expect to know po and p; and would expect the values of p; 
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to make sense for all & > 0. But here we have an interpretation of the pz only for 
0<k <Q and we know po and pg instead of po and p;. Such a situation is not for 
faint-hearted students. 


We are going to keep going as if we knew what we were doing. The characteristic 
equation is r? — 2r + 1 = 0. There is one root, r = 1. That means that the sequence 
ay = 1, for all k = 0,1,2,..., is a solution and so is by = k, for k = 0,1,2,.... We need 
to find A and B such that Aap + Bbo = 1 and Aag + Bbg = 0. We find that A = 1 
and B = —1/Q. Thus we have the general solution 


k Q-—k k 
Q  Q oO) 
Note that pz is defined for all k > 0 like it would be for any such linear two term 


recurrence. The fact that we are only interested in it for 0 < k < Q is no problem to 
the theory. 


Pe=1—- 


Suppose a rich student, Brently Q. Snodgrass the III, has 8,000 dollars and he 
wants to play the coin toss game to make 10,000 dollars so he has 2,000 his parents 
don’t know about. His probability of being ruined is (10,000 — 8000)/10000 = 1/5. 
His probability of getting his extra 2000 dollars is 4/5. A poor student who only had 
100 dollars and wanted to make 2000 dollars would have a probability of (2,100 — 
100) /2, 100 = 0.95 of being ruined. Life isn’t fair. 

There is one consolation. The expected number of times Brently will have to toss 
the coin to earn his 2,000 dollars is 16,000,000. It will take him 69.4 weeks tossing 40 
hours per week, one toss every 10 seconds. If he does get his 2000 dollars, he will have 
been working as a “coin tosser” for over a year at a salary of 72 cents per hour. He 
should get a minimum wage job instead! 
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Solutions for Basic Concepts in Graph Theory 


To specify a graph we must choose E € P2(V). Let N = |P2(V)|. (Note that N = (5).) 
There are 2" subsets E of P2(V) and (y ) of them have cardinality g. This proves (a) 
and answers (b). 


The sum is the number of ends of edges since, if x and y are the ends of an edge, the 
edge contributes 1 to the value of d(x) and 1 to the value of d(y). Since each edge has 
two ends, the sum is twice the number of edges. 

Since }°,, d(v) is even if and only if the number of odd summands is even, it follows 
that there are an even number of v for which d(v) is odd. 


(a) The graph is isomorphic to Q. The correspondence between vertices is given by 


o-(A4 3 CDEFGH 
~\H AC EF DGB 


where the top row corresponds to the vertices of Q. 


(b) The graph Q’ is not isomorphic to Q. It can be made isomorphic by deleting one 
edge and adding another. You should try to figure out which edges these are. 


(a) (0,2,2,3,4,4,4,5) is the degree sequence of Q. (b) If a pictorial representation 
of R can be created by labeling P’(Q) with the edges and vertices of R, then R has 
degree sequence (0, 2,2,3,4,4,4,5) because the degree sequence is determined by ¢. 


(c) This is the converse of (b). It is false. The following graph has degree sequence 
(0,2, 2,3,4,4,4,5) but cannot be morphed into the form P’(Q). 


(a) There is no graph Q with degree sequence (1,1,2,3,3,5) since the sum of the 
degrees is odd. The sum of the degrees of a graph is 2|E| and must, therefore, be even. 


(d) (answers (b) and (c) as well) There is a graph with degree sequence (1, 2, 2,3,3,5), 
no loops or parallel edges allowed. Take 


(e) (answers (f) as well) A graph with degree sequence (3, 3, 3,3) has (3+3+3+43)/2 =6 
edges and, of course 4 vertices. That is the maximum (5) of edges that a simple graph 
with 4 vertices can have. It is easy to construct such a graph. Draw the four vertices 
and make all possible connections. This graph is called the complete graph on 4 


vertices. 


(g) There is no simple graph (or graph without loops or parallel edges) with degree 
sequence (3,3,3,5). See (f). 
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GT-2.2 


GT-2.3 


GT-2.4 


(h) Similar arguments to (f) apply to the complete graph with degree sequence 
(4,4,4,4,4). Such a graph would have 20/2 = 10 edges. But (3) = 10. To con- 
struct such a graph, use 5 vertices and make all possible connections. 


(i) There is no such graph. See (h). 


Each of (a) and (c) has just one pair of parallel edges (edges with the same endpoints), 
while (b) and (d) each have two pairs of parallel edges. Thus neither (b) nor (d) is 
equivalent to (a) or (c). Vertex 1 of (b) has degree 4, but (d) has no vertices of degree 
4. Thus (b) and (d) are not equivalent. It turns out that (a) and (c) are equivalent. 
Can you see how to make the forms correspond? 


(a) We know that the expected number of triangles behaves like (np)?/6. This equals 
1 when p = 61/3 /n. 


By Example 6, the expected number of edges is ( p, which behaves like (n?/2)p for 
large n. Thus we expect about (6!/3/2)n 


Introduce random variables X 5, one for each S € P;,(V). Reasoning as in the example, 
E(Xg) = p*® where K = (Sy); the number of edges that must be present. Thus the 


expected number of sets of k vertices with all edges present is (3) pe 


For large n, this behaves like n*p* /k!, which will be 1 when p = (k!/n")!/*, For 
large n, the expected number of edges behaves like (n?/2)(k!/n®)!/*. This last number 


has the form C’n® where C = (k!)'/* /2 anda =2—k/K =2-2/(k—1)= ak) 


n 


The first part comes from factoring out (3) p° from the last equation in Example 7. To 
obtain the inequality, replace (1 —p*) with (1—p?), factor it out, and use 1+3(n—3) < 
3n. 


Since EF’ C P2(V), we have a simple graph. Regardless of whether you are in set C 
or S, following an edge takes you into the other set. Thus, following a path with an 
odd number of edges takes you to the opposite set from where you started while a 
path with an even number of edges takes you back to your starting set. Since a cycle 
returns to its starting vertex, it obviously returns to its starting set. 


(a) The graph is not Eulerian. The longest trail has 5 edges, the longest circuit has 4 
edges. 


(b) The longest trail has 9 edges, the longest circuit has 8 edges. 


(c) The longest trail has 13 edges (an Eulerian trail starting at C' and ending at D). 
The longest circuit has 12 edges (remove edge f). 


(d) This graph has an Eulerian circuit (12 edges). 
a) The graph is Hamiltonian. 
b) The graph is Hamiltonian. 


( 
( 
(c) The graph is not Hamiltonian. There is a cycle that includes all vertices except K. 
(d) The graph is Hamiltonian. 

( 


a) There are |V x V| potential edges to choose from. Since there are two choices for 
each edge (either in the digraph or not), we get gn? simple digraphs. 
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(b) With loops forbidden, our possible edges include all elements of V x V except 
those of the form (v,v) with v € V. Thus there are 2”("—) loopless simple digraphs. 
An alternative derivation is to note that a simple graph has (5) edges and we have 
4 possible choices in constructing a digraph: (i) omit the edge, (ii) include the edge 


directed one way, (iii) include the edge directed the other way, and (iv) include two 
edges, one directed each way. This gives 4(2) — 9("-1)_ The latter approach is not 
useful in doing part (c). 

(c) Given the set S of possible edges, we want to choose q of them. This can be done 


in Cy ways. In the general case, the number is (7) and in the loopless case it is 
en), 


q 
(a) Let V = {u,v} and E = {(u,v), (v,u)}. 

(b) For each {u,v} € Po(V) we have three choices: (i) select the edge (u,v), 
the edge (v,u) or (iii) have no edge between u and v. Let N = |P2(V)| = ( 
are 3% oriented simple graphs. 


ii) select 
. There 


NSN 


(c) We can choose qg elements of P2(V) and then orient each of them in one of two 
ways. This gives us (*) 27. 

(a) For all x € S, a|x. For all z,y € S, if x|y and x ¥ y, then y does not divide x. For 
all x,y,z € S, aly, y|z implies that z|z. 


(b) The covering relation is 


A= {(2,4): (2:6); (2,10), (2,.14)5 (3, 6)3.(3;9); B).15), 
(4,8), (4, 12), (5, 10), (5, 15), (6, 12), (7, 14)}. 


We leave it to you to draw the picture! 


(a) Suppose G is a connected graph with v vertices and v edges. A connected graph is 
a tree if and only if the number of vertices is one more than the number of edges. Thus 
G is not a tree and must have at least one cycle. This proves the base case, n = 0. 
Suppose n > 0 and G is a graph with v vertices and v + n edges. We know that the 
graph is not a tree and thus has a cycle. We know that removing an edge from a cycle 
does not disconnect the graph. However, removing the edge destroys any cycles that 
contain it. Hence the new graph G’ contains one less edge and at least one less cycle 
than G. By the induction hypothesis, G’ has at least n cycles. Thus G has at least 
n+1 cycles. 


(b) Let G be a graph with components Gj,...,G,. With subscripts denoting compo- 
nents, G; has uv; vertices, e; = v; +n; edges and at least n; + 1 cycles. From the last 
two formulas, G; has at least 1 + e; — v; cycles. Now sum over 7. 


(c) For each n we wish to construct a simple graph that has n more edges than vertices 
but has only n+ 1 cycles. There are many possibilities. Here’s one solution. The 
vertices are v and, for 0 <i <n, 2; and y;. The edges are {v, xz; }, {v,y:$, and {x;, y;}. 
(This gives n+ 1 triangles joined at v.) There are 1+2(n+1) vertices, 3(n +1) edges, 
and n+ 1 cycles. 


(a) yey (uv) = 2|E|. For a tree, |E| = |V| — 1. Since 2|V| = )> 2 


vEeV ™ 
2 = 2|V|-2|B| = 5 (2-d(v)). 
vEV 
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(b) Suppose that T is more than just a single vertex. Since T is connected, d(v) 4 0 
for all v. Let nz, be the number of vertices of T’ of degree k. By the previous result, 
dopo (2 — k)ng = 2. Rearranging gives nj = 2+ >, 50(k — 2)ng. If nm > 1, the sum 
is at least m — 2. = 


(c) Let the vertices be u and v; for 1 <i <m. Let the edges be {u,vu;} for 1<i<™m. 
(a) No such tree exists. A tree with six vertices must have five edges. 


(b) No such tree exists. Such a tree must have at least one vertex of degree three or 
more and hence at least three vertices of degree one. 


(c) A graph with two connected components, each a tree, each with five vertices will 
have this property. 


(d) No such graph exists. 
(c) 
(f) Such a graph must have at least c+e—v=1+6-—4=3 cycles. 
g) 


(g) No such graph exists. If the graph has no cycles, then each component is a tree. 
In such a graph, the number of vertices is strictly greater than the number of edges 
for each component and hence for the whole graph. 


No such tree exists. 


(a) The idea is that for a rooted planar tree of height h, having at most 2 children 
for each non-leaf, the tree with the most leaves occurs when each non-leaf vertex has 
exactly 2 children. You should sketch some cases and make sure you understand this 
point. For this case | = 2” and so log,(l) = h. Any other rooted planar tree of height 
h, having at most 2 children for each non-leaf, is a subtree (with the same root) of this 
maximal-leaf binary tree and thus has fewer leaves. 


(b) Knowing the number of leaves does not bound the height of a tree — it can be 
arbitrarily large. 


(c) The maximum height is h = 1—1. One leaf has height 1, one height 2, etc., one of 
height / — 2 and, finally, two of height / — 1. 


(d) (answers (e) as well) [log,(l)| is a lower bound for the height of any binary tree 
with / leaves. It is easy to see that you can construct a full binary tree with / leaves 
and height [log,(1)]. 


(a) A binary tree with 35 leaves and height 100 is possible. 


(b) A full binary tree with 21 leaves can have height at most 20. So such a tree of 
height 21 is impossible. 


(c) A binary tree of height 5 can have at most 32 leaves. So one with 33 leaves is 
impossible. 


(d) No way! The total number of vertices is 


(a) For (1) there are four spanning trees. For (2) there are 8 spanning trees. Note 
that there are (3) = 10 ways to choose three edges. Eight of these 10 choices result in 
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spanning trees, the other two choices result in cycles (with vertex sequences (A, B, D) 
and (B,C, D)). For (3) there are 16 spanning trees. 


(b) For (1) there is one. For (2) there are two. For (3) there are two. 
(c) For (1) there are two. For (2) there are four. For (3) there are six. 
(d) For (1) there are two. For (2) there are three. For (3) there are six. 


GT-3.7 (a) For (1) there are three minimum spanning trees. For (2) there are two minimum 
spanning trees. For (3) there is one minimum spanning tree. 


(b) For (1) there is one minimum spanning tree up to isomorphism. For (2) there are 
two. For (3) there is one. 


(c) For (1) there is one. For (2) there is one. For (3) there are four. 
(d) For (1) there are two. For (2) there is one. For (3) there are four. 


GT-3.8 (a) (and (b)) There are 21 vertices, so the minimum spanning tree has 20 edges. Its 
weight is 30. We omit details. 


(c) Note that K is a the only vertex in common to the two bicomponents of this graph. 
Whenever this happens (two bicomponents, common vertex), the depth-first spanning 
tree rooted at that common vertex has exactly two “principal subtrees” at the root. 
In other words, the root of the depth-first spanning tree has down-degree two (two 
children). The two children of kK can be taken to be P and L. P is the root of a 
subtree consisting of 5 vertices, 4 with one child, one leaf. L is the root of a subtree 
consisting of 15 vertices, 14 with one child, one leaf. 


GT-4.1 (a) The algorithm that has running time 100n is better than the one with running 
time n? for n > 100. 100n is better than (2”/1° — 1)100 for n > 60. For 1 <n < 10, 
(2”/1° _ 1)100 is worse than n?. At n = 10 they are the same. For 10 < n < 43, n? is 
worse than (2”/!° — 1)100. For n > 43, (2”/1° — 1)100 is worse than n?. Here are the 
graphs: 


20000 


15000 


loood 


S000 


20 40 BO a0 100 


(b) When n is very large, B is fastest and C is slowest. This is because, of two 
polynomials the one with the lower degree is eventually faster and an exponential 
function grows faster than any polynomial. 
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(a) The most direct way to prove this is to use Example 23. additional observations 
on © and O. 


jim =o >0 implies g(n) is O(f(n)) 


Let p(n) = pa bn’ with b, > 0. Take f(n) = p(n), g(n) = n® and C = by > 0. 
Thus, p(n) is O(n"), hence the equivalence class of each is the same set: O(p(n)) is 
O(n*). 

(b) O(p(n)) is O(n*) follows from (a). 

(c) lim p(n)/a" = 0. This requires some calculus. By applying l’Hospital’s Rule k 

noo 
times, we see that the limit is lim (k!/(log(a))*)/a”, which is 0. Any algorithm with 
noo 

exponential running time is eventually much slower than a polynomial time algorithm. 
(d) For p(n) to be @(ac”"), we must have positive constants A and B such that 
A< aP(n) /gen* < B. Taking logarithms gives us log, A < p(n) — Cn* < log, B. The 
center of this expression is a polynomial which is not constant unless p(n) = Cn* + D 


for some constant D, the case which is ruled out. Thus p(n) — C'n* is a nonconstant 
polynomial and so is unbounded. 


Here is a general method of working this type of problem: 
Let p(n) = yar bn’ with by > 0. We show by using the definition that O(p(n)) 
is O(n"). Let s = ys, |b;| and assume that n > 2s/b,. We have 


k-1 


i=0 


p(n) — byn*| < 


k-1 k-1 
< S- |bj|n? < Se \bs\n*-? = sn¥-1 < dyn /2. 
i=0 i=0 


Thus |p(n)| > byn® —bypn* /2 > (by /2)n* and also |p(n)| < byn*® +byn* /2 < (3b, /2)n*. 

The definition is satisfied with N = 2s/b,, A = (b,/2) and B = (3b;/2). If you 
want to show, using the definition, that O(p(n)) is O(Kn*) for some K > 0, replace 
A with A’! = A/K and B with B’ = B/K. 

In our particular cases we can be sloppy and it gets easier. Take (a) as an example. 
(a) For g(n) = n® + 5n? + 10, choose N such that n° > 5n? +10 for n > N. You can 
be ridiculous in the choice of N. N® > 5N? + 10 is valid if 1 > 5/N + 10/N?. N = 10 
is plenty big enough. If n? > 5n?+ 10 then n? < g(n) < 2n°. So taking A = 1 and 
B = 2 works for the definition: An? < g(n) < Bn® showing g is @(n?). If you want to 
use f(n) = 20n? as the problem calls for, replace these constants by A’ = A/20 and 
B’ = B/20. Thus, A’(20n?) < g(n) < B’(20n3) for n > N. 


This problem should make you appreciate the much easier approach of Example 23. 


(a) There is an explicit formula for the sum of the squares of integers. 


“2  n(n+1)(2n+1 
yee Jan+y) 


This is a polynomial of degree 3, hence the sum is O(n?). 
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(b) There is an explicit formula for the sum of the cubes of integers. 


i=1 
This is a polynomial of degree 4, hence the sum is O(n‘). 


(c) To show the >>”, i!/? is O(n%/?) it helps to know a little calculus. You can 


interpret the integral as upper and lower Riemann sum approximations to the integral 
of f(x) =21/? with Ax = 1: 


n n n—-1 n 
[i@ae< Der =TeP en < fs eartnl? 
0 i=1 i=1 t 


Since f x1/? dx = 2x°/?/3+C. You can fill in the details to get O(n°/?). 


The method used in (c) will also work for (a) and (b). The idea works in general: 
Suppose f(x) > 0 and f’(x) > 0. Let F(x) be the antiderivative of f(x). If f(n) is 
O(F(n)), then >>", f(n) is O(F(n)). There is a similar result if f(z) < 0: replace 
“f(n) is O(F(n))” with “f(1) is O(F'(n)).” 


(a) To show 57", 7! is O(log,(n)) for any base b > 1 use the Riemann sum trick 
from the previous exercise. fj" «~' dx = In(x). This shows that )7/_, 77! is O(log,(n)). 
But, log. (x) = log, (b) log, (a) (as we learned in high school). Thus, log,(x) and log, (x) 
belong to the same © equivalence class as they differ by a positive constant multiple 
log.(b) (recall b > 1). 


(b) First you need to note that log,(n!) = >>""_, log,(i). Use the Riemann sum trick 
again. 


i log,(x) dx = log,(e) [ log, (x) dx = log,(e)(nIn(n) —n +1). 


Thus, the sum is O(n In(n) — n+ 1) which is O(n In(n)) which is O(n log,(n)). 


(c) Use Stirling’s approximation for n!, n! is asymptotic to (n/e)"(27n)!/?. Thus, 
n! is O((n/e)"(2mn)!/2), by Example 23. Do a little algebra to rearrange the latter 
expression to get O((n/e)"t1/2). 


A single execution of “C(i,j) = C(i,j) + A(Gi,k)*B(k,j)” takes a constant amount of time 

and so its time is O(1). 

The loop on k is done n times and so its time is nO(1), which is O(n). 

The loop on j is done n times and each time requires work that is O(n). Thus its time 

is nO(n), which is O(n?). 

The loop on i is done n times and so its time is nO(n?), which is O(n?). 
Alternatively, you could notice that innermost loops take the most time and “C(i,j) 

= C(i,j) + A(i,k)*B(k,j)” is executed once for each value of i, j, and k. Thus it is done 

n® times and so the time for the algorithm is O(n?). 


We use the Master Theorem. Since there is just one recursive call, w = 1 and s,(n) = q. 
Since 0 < n/2—q < 1/2, c = 1/2. We have T(n) = a, + T(si(n)) where a, is 
1 or 2. Thus a, is @(n°). In summary, w = 1, c = 1/2 and b = 0. Thus d = 
— log(1)/log(1/2) = 0 and so T(n) is O(log n). 
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Notation Index 


4 (there exists) 48 Probability notation 
V (for all) 48 fix (expectation, or mean) 68 
p(X,Y) (correlation) 69 

ug es ox (standard deviation) 69 
B,, (Bell numbers) 27 E(X) (expectation) 68 
s ~ t (equivalence relation) 151 Cov(X,Y) (covariance) 69 
(7) (binomial coefficient) 15 Var(X) (variance) 69 
( n ) (multinomial P(A|B) (conditional probability) 115 
man, 20 POSV(T) (postorder sequence of 
BFE(T) (breadth first vertex vertices) 96 

sequence) 96, 175 PREV(T) (preorder sequence of 
BFV(T) (breadth first vertex vertices) 96 

sequence) 96, 175 Q (rational numbers) 45 
C(n,k) (binomial coefficient) 15 R (real numbers) 28, 45 
Cov(X,Y) (covariance) 69 p(X,Y) (correlation) 69 
DFV(T) (depth first vertex Set notation 

sequence) 96, 175 ~A (complement) 14, 45 


€ and ¢ (inand not in) 14 


By fected): 10 A’ (complement) 14, 45 


DFE(T) (depth first edge A—B (difference) 14, 45 
sequence) 96, 175 AN B (intersection) 14, 45 
jux (expectation AUB (union) 14, 45 
or mean) 68 A@® B (symmetric difference) 45 
E(X) (expectation) 68 A\ B (difference) 14, 45 


AC B (subset) 14 


iti 1 
7 og (compontaon), o A x B (Cartesian product) 4, 45 


(n), (falling factorial) 9 Ae (complement) 14, 45 

F,, (Fibonacci numbers) 137 Py.(A) (k-subsets of A) 15, 45 
|x| (floor) 139 |A| (cardinality) 3, 14 

(V,E) (simple graph) 148 ox (standard deviation) 69 
(V,E,¢) (graph) 149 S(n,k) (Stirling numbers) 25 

N (natural numbers) 13 O() (rate of growth) 184 

n (first n integers) 45 Var(X) (variance) 69 

O() (Big oh notation) 184 Z (integers) 13, 45 


o( ) (little oh notation) 186 
P(A) (k-subsets of A) 15, 45 
S(A) (permutations of A) 51 
PER(A) (permutations of A) 51 
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Subject Index 


Absorption rule 15 Binomial coefficients 15 
Adjacent vertices 149 recursion 23 
Algebraic rules for Binomial distribution 78 
sets 15 Binomial theorem 18 
Algorithm Bipartite graph 169 
backtracking 95 cycle lengths of 180 
divide and conquer 104, 191 Blocks of a partition 20, 25, 59 


eal Z Commumeneiah? Boolean variables 125 
spanning tree) 179 


lineal (= depth-first) spanning Breadth first vertex (edge) 
tree 179 sequence 96, 175 


partial 191 
polynomial time (tractable) 189 


Prim’s (minimum weight Card hands 
spanning tree) 178 and multinomial coefficients 23 
which is faster? 189 full house 19 


straight 26 
two pairs 19 


Cardinality 3 

Cardinality of aset 14 
Cartesian product 4, 45 
Central Limit Theorem 82 


Antisymmetric binary relation 170 
Associative rule 15 
Asymptotic 186 


Average running time 188 


Backtracking 95 Characteristic equation 137 
Base (simplest) cases for Characteristic function 111 
induction 131 Chebyshev’s inequality 71 
Bayes’ Theorem 116, 120 Child vertex 90, 173 
Bell numbers 27 Chromatic number 188, 191 
Bicomponents 168 Circuit ina graph 164 
Biconnected components 168 Eulerian 167 
Bijection 47 Clique 190 
Binary relation 151 Clique problem 190 
antisymmetric 170 Codomain (range) of a function 46 


cavern. #0 . Coimage of a function 58 
equivalence relation 151 


order télahion. 176 Coloring a graph 188, 191 
reflexive 151 Coloring problem 190 
symmetric 151 Commutative rule 15 


ane TD Comparing algorithms 189 


Binary tree 182 


Complement of a set 45 
full 182 


Complete simple graph 162 
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Component connected 165 
Composition of an integer 8 
Composition of functions 51 
Conditional probability 115 
Conjunctive normal form 125 
Connected components 165 
Correlation 69 

Covariance 69 

Covering relation 170 


Cycle ina graph 164 
Hamiltonian 167 


Cycle in a permutation 53 


Decision tree 89 
see also Rooted tree 
Monty Hall 122 
ordered tree is equivalent 173 
probabilistic 118 
RP-tree is equivalent 173 
Towers of Hanoi 106 
traversals 96, 174 


Decreasing (strictly) function or 


list 61 
Decreasing (weakly) function or 
list 61 


Degree of a vertex 90, 150 
Degree sequence of a graph 150 
DeMorgan’s rule 15 

Density function 66 


Depth first vertex (edge) 
sequence 96, 175 


Derangement 56 


Deviation 
standard 69 


Dictionary order 4 


Digraph 161 
functional 176 


Direct (Cartesian) product 4, 45 


Direct insertion order for 
permutations 94 


Directed graph 161 


Index 


Directed loop 161 
Disjunctive normal form 126 


Distribution 66 
binomial 78 
hypergeometric 32 
joint 72 
marginal 72 
normal 80 
Poisson 79 
uniform 28 


Distribution function 
see Distribution 


Distributive rule 15 

Divide and conquer 104, 191 
Domain of a function 46 
Domino covering 99 

Double negation rule 15 


Down degree of a vertex 90 


Edge 90, 148 

directed 161 

incident on vertex 149 

loop 150, 157 

parallel 157 
Edge sequence 

breadth first 96, 175 

depth first 96, 175 
Elementary event 29 
Envelope game 46 
Equation 

characteristic 137 
Equivalence class 151 
Equivalence relation 151 
Error 

percentage 10 

relative 10 
Eulerian circuit or trail 167 
Event 28, 65 

elementary=simple 29 

independent pair 73, 116 


Index 


Expectation of a random 
variable 68 


Factorial 
falling 9 


Factorial estimate (Stirling’s 
formula) 10 


Falling factorial (n), 9 
Fibonacci recursion 137 
First Moment Method 126 
Full binary tree 182 


Function 

bijection 47 

characteristic 111 

codomain (range) of 46 

coimage of 58 

composition of 51 

decreasing: decision tree 102 

density 66 

distribution, see Distribution 

domain of 46 

generating 16 

image of 58 

image of and Stirling numbers 
(set partitions) 59 

injective (one-to-one) 47 

inverse 47 

inverse image of 58 

monotone 61 

one-line notation 46 

partial 91 

probability 65 

range of 46 

restricted growth and set 
partitions 64 

strictly decreasing 61 

strictly increasing 61 

surjective (onto) 47 

two-line notation 49 

weakly decreasing 61 

weakly increasing 61 


Functional relation 48 


Gambler’s ruin problem 140 
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Generating function 16 
Geometric probability 34 
Geometric series 139 


Graph 149 

see also specific topic 

biconnected 168 

bipartite 169 

bipartite and cycle lengths 180 

complete simple 162 

connected 165, 165 

directed 161 

incidence function 149 

induced subgraph (by edges or 
vertices) 164 

isomorphism 153 

oriented simple 170 


random 154 
rooted 173 
simple 148 


subgraph of 163 
Gray code for subsets 111 


Growth 
rate of, see Rate of growth 


Hamiltonian cycle 167 

Hasse diagram 170 

Height of a tree 182 

Height of a vertex 91 
Hypergeometric probability 32 


Idempotent rule 15 
Identity permutation 51 


Image of a function 58 
Stirling numbers (set partitions) 
and 59 


Incidence function of a graph 149 
Inclusion and exclusion 31, 39 


Increasing (strictly) function or 


list 61 
Increasing (weakly) function or 
list 61 


Independent events 73, 116 


Independent random variables 73 


Induced subgraph (by edges or 
vertices) 164 


Induction 52, 130 
base (simplest) cases 131 
induction hypothesis 131 
inductive step 131 


Inequality 
Tchebycheff 71 


Injection 47 

Internal vertex 90, 173 
Intersection of sets 45 

Inverse image of a function 58 
Involution 54 

Isolated vertex 157 

Isomorph rejection 102 


Isomorphic graphs 153 


Joint distribution function 72 


Kruskal’s algorithm for minimum 
weight spanning tree 179 


Leaf vertex 90, 173 


rank of 92 
Lexicographic order (lex order) 4 
List 2 


circular 10 
strictly decreasing 61 
strictly increasing 61 
weakly decreasing 61 
weakly increasing 61 
with repetition 3 
without repetition 3, 9 
without repetition are 
injections 47 
Little oh notation 186 


Local description 104 
Gray code for subsets 113 
merge sorting 103 
permutations in lex order 105 
Towers of Hanoi 107 


247 


Index 


Loop 150, 157 
directed 161 


Machine independence 184 
Marginal distribution 72 


Matrix 
permutation 55 


Merge sorting 103, 192 
Merging sorted lists 103 
Monotone function 61 
Multinomial coefficient 20 


Multiset 3 
and monotone function 61 


Nondecreasing function or list 61 
Nonincreasing function or list 61 
Normal distribution 80 


Normal form 
conjunctive 125 
disjunctive 126 


NP-complete problem 190 
NP-easy problem 190 
NP-hard problem 190 


Numbers 
Bell 27 
binomial coefficients 15 
Fibonacci 137 
Stirling (set partitions) 25, 59 


Odds 32 

One-line notation 46 

One-to-one function (injection) 47 
Onto function (surjection) 47 
Order 


direct insertion for 
permutations 94 
lexicographic (lex) 4 
Order relation 170 


Index 


Oriented simple graph 170 


Parallel edges 157 
Parent vertex 90, 173 
Partial function 91 


Partition 
set 25, 58 
set (ordered) 20 
set and restricted growth 
function 64 


Path in a (directed) graph 162 


Permutation 3, 47, 51 
cycle 53 
cycle form 53 
cycle length 53 
derangement 56 
direct insertion order 94 
identity 51 
involution 54 
is a bijection 47 
matrix 55 
powers of 51 
random generation 77 


Poisson distribution 79 
Polynomial multiplication 194 
Polynomial time algorithm 
(tractable) 189 
Postorder sequence of vertices 96 
Preorder sequence of vertices 96 
Prime factorization 131 
Prim’s algorithm for minimum 
weight spanning tree 178 
Probabilistic decision tree 118 
Probability 
conditional 115 
conditional and decision 
trees 118 


function 28 
probability space 28 


Probability distribution function 
see Distribution 

Probability function 28, 65 
see also Distribution 
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Probability space 28, 65 
see also Distribution 


Random generation of 
permutations 77 


Random graphs 154 


Random variable 66 
binomial 78 
correlation of two 69 
covariance of two 69 
independent pair 73 
standard deviation of 69 
variance of 69 


Range of a function 46 
Rank (of a leaf) 92 


Rate of growth 
Big oh notation 184 
comparing 189 
exponential 189 
little oh notation 186 
polynomial 186, 189 
Theta notation 184 


Rearranging words 20 


Recurrence 
see Recursion 


Recursion 132 
see also Recursive procedure 
binomial coefficients 23 
Fibonacci 137 
guessing solutions 134 
inductive proofs and 130 
set partitions (Bell numbers) 27 
set partitions (Stirling 

numbers) 25 

sum of first n integers 132 


Recursive equation 
see Recursion 


Recursive procedure 
see also Recursion 
0-1 sequences 103 
Gray code for subsets 113 
merge sorting 103 
permutations in lex order 105 
Towers of Hanoi 107 


Reflexive relation 151 


Relation 48 
see perhaps Binary relation 


Relative error 10 


Restricted growth function and set 
partitions 64 


Root 90 
Rooted graph 173 


Rooted tree 
child 90, 173 
down degree of a vertex 90 
height of a vertex 91 
internal vertex 90, 173 
leaf 90, 173 
parent 90, 173 
path to a vertex 91 
siblings 173 

RP-tree (rooted plane tree) 

see Decision tree 


Rule 
absorption 15 
associative 15 
commutative 15 
DeMorgan’s 15 
distributive 15 
double negation 15 
idempotent 15 

Rule of Product 3 


Rule of Sum 5 


Sample space 28, 65 

SAT problem 126 
Satisfiability problem 126 
Sequence 2 


Series 
geometric 139 
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Index 


Set 2,14 
algebraic rules 15 
and monotone function 61 
cardinality 3 
cardinality of 14 
Cartesian product 14 
complement 14 
complement of 45 
difference 14 
intersection 14 
intersection of two 45 
partition, see Set partition 
subset 14 
subsets of size k 15 
symmetric difference 14 
symmetric difference of two 45 
union 14 
union of two 45 
with repetition (multiset) 3 


Set partition 25, 58 
ordered 20 
recursion (Bell numbers) 27 
recursion (Stirling numbers) 25 
restricted growth function 64 
Simple event 29 
Simple graph 148 
Simplest (base) cases for 
induction 131 
Sorting (merge sort) 103, 192 


Space 
probability 28 


Spanning tree 177 
lineal (= depth first) 180 
minimum weight 177 


Stacks and recursion 109 
Standard deviation 69 


Stirling numbers (set partitions) 25 
image of a function 59 


Stirling’s approximation for n! 10 


Strictly decreasing function or 
list 61 

Strictly increasing (or decreasing) 
function or list 61 


Strictly increasing function or 
list 61 


Index 


String 
see List 
Subgraph 163 
cycle 164 
induced by edges or vertices 164 


Subset of a set 14 
Surjection 47 
Symmetric difference of sets 45 


Symmetric relation 151 


Tchebycheff’s inequality 71 


Theorem 

Bayes’ 116, 120 

binomial coefficients 16 

binomial theorem 18 

bipartite and cycle lengths 180 

Central Limit 82 

conditional probability 116 

correlation bounds 70 

covariance when independent 76 

cycles and multiple paths 165 

equivalence relations 151 

expectation is linear 68 

expectation of a product 76 

induction 130 

lists with repetition 3 

lists without repetition 9 

minimum weight spanning 
tree 178 

monotone functions and 
(multi)sets 62 

permutations of set to fixed 
power 54 

Prim’s algorithm 178 

properties of O and O 185 

Rule of Product 3 

Rule of Sum 5 

Stirling’s formula 10 

systematic tree traversal 97 

Tchebycheff’s inequality 71 

variance of sum 76 

walk, trail and path 163 


Towers of Hanoi 106 
four pole version 114 


Tractable algorithm 190 


Trail in a (directed) graph 162 
Transitive relation 151 


Traveling salesman problem 190 


Traversal 
decision tree 96, 174 
Tree 
see also specific topic 
binary 182 
decision, see Decision tree 
height 182 


ordered tree, see Decision tree 

rooted, see Rooted tree 

RP-tree (rooted plane tree), see 
Decision tree 

spanning 177 

spanning, lineal (= depth 
first) 180 

spanning, minimum weight 177 


Two-line notation 49 


Uniformly at random 28 
Union of sets 45 


Variance 69 
Venn diagram 31 


Vertex 90 
adjacent pair 149 
child 90, 173 
degree of 90, 150 
down degree of 90 
height of 91 
internal 90, 173 
isolated 157 
leaf 90, 173 
parent 90, 173 


Vertex sequence 162 
breadth first 96, 175 
depth first 96, 175 


Walk ina graph 162 


Weakly decreasing function or 
list 61 


Index 


Weakly increasing function or 
list 61 


Words 11, 20 
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